scatterplot function - RDocumentation (2024)

Description

This function uses basic R graphics to draw a two-dimensional scatterplot, with options to allow for plot enhancements that are often helpful with regression problems. Enhancements include adding marginal boxplots, estimated mean and variance functions using either parametric or nonparametric methods, point identification, jittering, setting characteristics of points and lines like color, size and symbol, marking points and fitting lines conditional on a grouping variable, and other enhancements.sp is an abbreviation for scatterplot.

Usage

scatterplot(x, ...)# S3 method for formulascatterplot(formula, data, subset, xlab, ylab, id=FALSE, legend=TRUE, ...)
# S3 method for defaultscatterplot(x, y, boxplots=if (by.groups) "" else "xy", regLine=TRUE, legend=TRUE, id=FALSE, ellipse=FALSE, grid=TRUE, smooth=TRUE, groups, by.groups=!missing(groups), xlab=deparse(substitute(x)), ylab=deparse(substitute(y)), log="", jitter=list(), cex=par("cex"), col=carPalette()[-1], pch=1:n.groups, reset.par=TRUE, ...)
sp(x, ...)

Value

If points are identified, their labels are returned; otherwise NULL is returned invisibly.

Arguments

x

vector of horizontal coordinates (or first argument of generic function).

y

vector of vertical coordinates.

formula

a model formula, of the form y ~ x or, if plotting by groups, y ~ x | z, where z evaluates to a factor or other variable dividing the data into groups. If x is a factor, then parallel boxplots are produced using the Boxplot function.

data

data frame within which to evaluate the formula.

subset

expression defining a subset of observations.

boxplots

if "x" a marginal boxplot for the horizontal x-axis is drawn below the plot; if "y" a marginal boxplot for vertical y-axis is drawn to the left of the plot; if "xy" both marginal boxplots are drawn; set to "" or FALSE to suppress both boxplots.

regLine

controls adding a fitted regression line to the plot. if regLine=FALSE, no line is drawn. If TRUE, the default, an OLS line is fit. This argument can also be a list. The default of TRUE is equivalent to regLine=list(method=lm, lty=1, lwd=2, col=col), which specifies using the lm function to estimate the fitted line, to draw a solid line (lty=1) of width 2 times the nominal width (lwd=2) in the color given by the first element of the col argument described below.

legend

when the plot is drawn by groups and legend=TRUE, controls placement and properties of a legend; if FALSE, the legend is suppressed. Can be a list of named arguments, as follows: title for the legend; inset, giving space as a proportion of the axes to offset the legend from the axes; coords specifying the position of the legend in any form acceptable to the legend function or, if not given, the legend is placed above the plot in the upper margin; columns for the legend, determined automatically to prefer a horizontal layout if not given explicitly; cex giving the relative size of the legend symbols and text. TRUE (the default) is equivalent to list(title=deparse(substitute(groups)), inset=0.02, cex=1).

id

controls point identification; if FALSE (the default), no points are identified; can be a list of named arguments to the showLabels function; TRUE is equivalent to list(method="mahal", n=2, cex=1, col=carPalette()[-1], location="lr"), which identifies the 2 points (in each group) with the largest Mahalanobis distances from the center of the data. See showLabels for a description of the other arguments. The default behavior of id is not the same in all graphics functions in car, as the method used depends on the type of plot.

ellipse

controls plotting data-concentration ellipses. If FALSE (the default), no ellipses are plotted. Can be a list of named values giving levels, a vector of one or more bivariate-normal probability-contour levels at which to plot the ellipses; robust, a logical value determing whether to use the cov.trob function in the MASS package to calculate the center and covariance matrix for the data ellipses; and fill and fill.alpha, which control whether the ellipse is filled and the transparency of the fill. TRUE is equivalent to list(levels=c(.5, .95), robust=TRUE, fill=TRUE, fill.alpha=0.2).

grid

If TRUE, the default, a light-gray background grid is put on the graph

smooth

specifies a nonparametric estimate of the mean or median function of the vertical axis variable given the horizontal axis variable and optionally a nonparametric estimate of the conditional variance. If smooth=FALSE neither function is drawn. If smooth=TRUE, then both the mean function and variance funtions are drawn for ungrouped data, and the mean function only is drawn for grouped data. The default smoother is loessLine, which uses the loess function from the stats package. This smoother is fast and reliable. See the details below for changing the smoother, line type, width and color, of the added lines, and adding arguments for the smoother.

groups

a factor or other variable dividing the data into groups; groups are plotted with different colors, plotting characters, fits, and smooths. Using this argument is equivalent to specifying the grouping variable in the formula.

by.groups

if TRUE (the default when there are groups), regression lines are fit by groups.

xlab

label for horizontal axis.

ylab

label for vertical axis.

log

same as the log argument to plot, to produce log axes.

Author

John Fox jfox@mcmaster.ca

Details

Many arguments to scatterplot were changed in version 3 of car to simplify use of this function.

The smooth argument is used to control adding smooth curves to the plot to estimate the conditional center of the vertical axis variable given the horizontal axis variable, and also the conditional variability. Setting smooth=FALSE omits all smoothers, while smooth=TRUE, the default, includes default smoothers. Alternatively smooth can be set to a list of subarguments that provide finer control over the smoothing.

The default behavior of smooth=TRUE is equivalent to smooth=list(smoother=loessLine, var=!by.groups, lty.var=2, lty.var=4, style="filled", alpha=0.15, border=TRUE, vertical=TRUE), specifying the default loessLine smoother for the conditional mean smooth and variance smooth. The color of the smooths is the same of the color of the points, but this can be changed with the arguments col.smooth and col.var.

Additional available smoothers are gamLine which uses the gam function and quantregLine which uses quantile regression to estimate the median and quartile functions using rqss. All of these smoothers have one or more arguments described on their help pages, and these arguments can be added to the smooth argument; for example, smooth = list(span=1/2) would use the default loessLine smoother, include the variance smooth, and change the value of the smoothing parameter to 1/2.

For loessLine and gamLine the variance smooth is estimated by separately smoothing the squared positive and negative residuals from the mean smooth, using the same type of smoother. The displayed curves are equal to the mean smooth plus the square root of the fit to the positive squared residuals, and the mean fit minus the square root of the smooth of the negative squared residuals. The lines therefore represent the comnditional variabiliity at each value on the horizontal axis. Because smoothing is done separately for positive and negative residuals, the variation shown will generally not be symmetric about the fitted mean function. For the quantregLine method, the center estimates the conditional median, and the variability estimates the lower and upper quartiles of the estimated conditional distribution.

The default depection of the variance functions is via a shaded envelope between the upper and lower estimate of variability. setting the subarguement style="lines" will display only the boundaries of this region, and style="none" suppresses variance smoothing.

For style="filled" several subarguments modify the appearance of the region: codealpha is a number between 0 and 1 that specifies opacity with defualt 0.15, border, TRUE or FALSE specifies a border for the envelope, and vertical either TRUE or FALSE, modifies the behavior of the envelope at the edges of the graph.

The sub-arguments spread, lty.spread and col.spread of the smooth argument are equivalent to the newer var, col.var and lty.var, respectively, recognizing that the spread is a measuure of conditional variability.

References

Fox, J. and Weisberg, S. (2019) An R Companion to Applied Regression, Third Edition, Sage.

Examples

Run this code

scatterplot(prestige ~ income, data=Prestige, ellipse=TRUE,  smooth=list(style="lines"))scatterplot(prestige ~ income, data=Prestige,  smooth=list(smoother=quantregLine)) scatterplot(prestige ~ income, data=Prestige,  smooth=list(smoother=quantregLine, border="FALSE"))# use quantile regression for median and quartile fitsscatterplot(prestige ~ income | type, data=Prestige, smooth=list(smoother=quantregLine, var=TRUE, span=1, lwd=4, lwd.var=2))scatterplot(prestige ~ income | type, data=Prestige,  legend=list(coords="topleft"))scatterplot(vocabulary ~ education, jitter=list(x=1, y=1), data=Vocab, smooth=FALSE, lwd=3)scatterplot(infantMortality ~ ppgdp, log="xy", data=UN, id=list(n=5))scatterplot(income ~ type, data=Prestige)if (FALSE) # interactive point identification # remember to exit from point-identification mode scatterplot(infantMortality ~ ppgdp, id=list(method="identify"), data=UN)

Run the code above in your browser using DataLab

scatterplot function - RDocumentation (2024)

FAQs

What is the function for scatter plot in R studio? ›

A scatter plot can be created using the function plot(x, y). The function lm() will be used to fit linear models between y and x. A regression line will be added on the plot using the function abline(), which takes the output of lm() as an argument. You can also add a smoothing line using the function loess().

Discover More Details ›

Which scatterplot shows no correlation between the variables responses? ›

A scatterplot with no correlation has data that does not follow a pattern, neither positive nor negative. The scatterplot shows haphazard points that follow no direction. For example, there are three points, (35,50), (50,35), and (70,50).

Find Out More ›

How to estimate correlation from a scatter plot? ›

If the data points make a straight line going from near the origin out to high y-values, the variables are said to have a positive correlation. If the data points start at high y-values on the y-axis and progress down to low values, the variables have a negative correlation.

Keep Reading ›

What is the function of a scatter plot? ›

Scatter plots' primary uses are to observe and show relationships between two numeric variables. The dots in a scatter plot not only report the values of individual data points, but also patterns when the data are taken as a whole. Identification of correlational relationships are common with scatter plots.

Know More ›

What is the formula for a scatter plot? ›

A scatter plot uses dots or points on a graph to demonstrate a relationship between the two variables. The scatter plot can have a positive correlation or negative correlation if the variables have a relationship. If no relationship, the scatter plot has no correlation. The line of best fit equation is y = m(x) + b.

Discover More Details ›

How to interpret a scatter plot in R? ›

You interpret a scatterplot by looking for trends in the data as you go from left to right: If the data show an uphill pattern as you move from left to right, this indicates a positive relationship between X and Y. As the X-values increase (move right), the Y-values tend to increase (move up).

How do you know if a scatter plot has no correlation? ›

When there is no clear relationship between the two variables, we say there is no correlation between the two variables.

What makes a scatter plot weak? ›

The more spread out the points are, the weaker the relationship. If the points are clearly clustered, or closely follow a curve or line, the relationship is described as strong. The linearity of scatter plot indicates how close the points are to a straight line.

Learn More ›

Does a scatter plot show the relationship between two variables? ›

Such a graphical representation is called a scatterplot. A scatterplot shows the relationship between two quantitative variables measured for the same individuals. The values of one variable appear on the horizontal axis, and the values of the other variable appear on the vertical axis.

For which scatterplot is the correlation strongest? ›

The strongest correlation is any any correlation nearest to 1 or −1 which represent perfect correlations. If this is through graph, the scatter plot should show near to form a perfect diagonal line.

Keep Reading ›

How do you identify correlation and causation in a scatter plot? ›

How can we determine if variables are correlated? If there is a correlation between two variables, a pattern can be seen when the variables are plotted on a scatterplot. If this pattern can be approximated by a line, the correlation is linear. Otherwise, the correlation is non-linear.

Read The Full Story ›

What are the disadvantages of a scatter plot? ›

Disadvantages of scatter plots

Limited analysis: A scatter plot only allows you to analyze two variables at a time. This may limit the depth of your analysis of a certain topic or demographic.

Discover More ›

What does a scatterplot tell you? ›

A scatter plot identifies a possible relationship between changes observed in two different sets of variables. It provides a visual and statistical means to test the strength of a relationship between two variables.

See Details ›

How is a relationship determined when looking at a scatterplot? ›

Answer: Scatter plots show how much one variable is affected by another. The relationship between two variables is called their correlation . ... If the line goes from a high-value on the y-axis down to a high-value on the x-axis, the variables have a negative correlation .

See Details ›

What is the plot function in R studio? ›

There are three basic plotting functions in R: high-level plots, low-level plots, and the layout command par. Basically, a high-level plot function creates a complete plot and a low-level plot function adds to an existing plot, that is, one created by a high-level plot command.

Get More Info Here ›

What is the default scatterplot function in R? ›

The simplest way to create a scatterplot is to directly graph two variables using the default settings. In R, this can be accomplished with the plot(XVAR, YVAR) function, where XVAR is the variable to plot along the x-axis and YVAR is the variable to plot along the y-axis.

Discover More Details ›

What does scatter () do? ›

scatter() Python is a versatile and powerful tool for visualizing relationships between variables through scatter plots.

Learn More Now ›

What does the Geom_point function do in R? ›

The point geom is used to create scatterplots. The scatterplot is most useful for displaying the relationship between two continuous variables.

Explore More ›

scatterplot function - RDocumentation (2024)

Description

Usage

Value

Arguments

Author

Details

References

See Also

Examples

FAQs

What is the function for scatter plot in R studio? ›

For which scatterplot is the correlation strongest? ›