Hopefully by now you've started up RStudio (or your IDE or text editor of choice) and are greeted by a command prompt:
Sebastian Campbell and José Padarian
Use the left and right arrow keys to navigate slides.
Hopefully by now you've started up RStudio (or your IDE or text editor of choice) and are greeted by a command prompt:
As is traditional in programming, we'll start off with a Hello world!
program. Try the below:
print("Hello world!")
## [1] "Hello world!"
Congratulations! You've taken your first steps into R. You've used the function print
to print the text "Hello world!" to the screen. Functions are basically like verbs in language, they tell R to do something. Notice the brackets ()
after the name of the function in which you can tell the function what to do.
Don't worry too much about the [1]
at the beginning, we'll explain that soon. Note that you can also type "Hello world!"
without the print
command and R will assume you meant to print it.
We can also use R as a calculator for addition and powers:
2 + 2
## [1] 4
2^3
## [1] 8
Note that there are other operators we can use, like *
for multiplications, -
for subtraction and /
for division.
We'll be having a few quizzes like this during this tutorial. No pressure, it's just to check your understanding. Click 'Submit' to check if your answer is right. If you give up 'Check answer' will give you an answer. 'Clear' will allow you to start that quiz from scratch.
6 / 2
?It's 3, because $\frac{6}{2} = 3$.
So now we can do some maths, but we don't want to have to remember the results every time. So let's store some as variable, so we can refer to them by name. To assign, we can use either the =
or <-
operator (though <-
is recommended).
core_volume <- 100
core_mass <- 180
bulk_density <- core_mass/core_volume
bulk_density
## [1] 1.8
Note that assignment doesn't print the variable to the console.
R doesn't just work with numbers, it can also work with "strings" or characters. Strings are denoted by the quotes (""
) surrounding them. We've already used these when we made our "Hello world!" program. In fact, we can use the paste
function to stick them together.
paste("The bulk density is:", 1.75, sep = " ")
## [1] "The bulk density is: 1.75"
The paste
function sticks strings and numbers together into a single string. The sep
argument says: "Put a space between all of the things I'm sticking together".
Let's combine all the things we know so far into something useful. Remember the bulk density we calculated earlier?
paste("The bulk density is:", bulk_density, sep = " ")
## [1] "The bulk density is: 1.8"
Here R evaluated our variable bulk_density
and added it to the string we were trying to make.
"1:2:3:x:y:z"
The first option. The second option won't work as there's only one string, so R can't place separators between strings. The third option is trying to call variables named x
, y
and z
(but it's likely none exist). Finally, the fourth option is space-separated, not colon-separated.
So we've been dealing with individual strings and numbers, let's move onto combining those into vectors. A vector is basically just a 1-dimensional array of data. Like a set of numbers or a shopping list. It's really easy to make vectors of sequential integers:
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
5:-5
## [1] 5 4 3 2 1 0 -1 -2 -3 -4 -5
You can also make vectors of strings or numbers using the combine (c
) function:
c(5, 4, 3, 2, 8)
## [1] 5 4 3 2 8
c("sand", "silt", "clay")
## [1] "sand" "silt" "clay"
But be careful. These kinds of vectors can only take one type of data. If you combine numbers and strings together:
c("SOILID001", 3, 5, "SOILID006", 7)
## [1] "SOILID001" "3" "5" "SOILID006" "7"
Then they all become characters.
If you think about it, you can treat numbers as letters, but it's a lot harder to treat letters as numbers.
Now that we can make vectors we can make subsets of bigger datasets, matrices and dataframes. Just to start off, we'll use a simple dataset which comes with R: trees
(Try ?trees
for more information about the data). It is a data frame, a collection of vectors as columns. The vectors are allowed to be different types (character, integer, numeric), but they must all be the same length.
trees
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
## 7 11.0 66 15.6
## 8 11.0 75 18.2
## 9 11.1 80 22.6
## 10 11.2 75 19.9
## 11 11.3 79 24.2
## 12 11.4 76 21.0
## 13 11.4 76 21.4
## 14 11.7 69 21.3
## 15 12.0 75 19.1
## 16 12.9 74 22.2
## 17 12.9 85 33.8
## 18 13.3 86 27.4
## 19 13.7 71 25.7
## 20 13.8 64 24.9
## 21 14.0 78 34.5
## 22 14.2 80 31.7
## 23 14.5 74 36.3
## 24 16.0 72 38.3
## 25 16.3 77 42.6
## 26 17.3 81 55.4
## 27 17.5 82 55.7
## 28 17.9 80 58.3
## 29 18.0 80 51.5
## 30 18.0 80 51.0
## 31 20.6 87 77.0
The basic form of a subset is object[whichrows, whichcolumns]
. Leaving an element blank means select all. So for example, to get the third row of trees
, or the second column:
trees[3, ]
## Girth Height Volume
## 3 8.8 63 10.2
trees[, 2]
## [1] 70 65 63 72 81 83 66 75 80 75 79 76 76 69 75 74 85 86 71 64 78 80 74
## [24] 72 77 81 82 80 80 80 87
You might have noticed the [24]
in the second line of trees[, 2]
. This indicates that the 72 next to it is the 24th element of that vector. There just isn't enough room on the screen to show it all in one line.
We can get individual values by setting both rows and columns
trees[3, 2]
## [1] 63
trees[1, 1]
## [1] 8.3
And we can also use nrow
and ncol
(gets the number of rows or columns) to get the last elements:
nrow(trees)
## [1] 31
trees[nrow(trees), ]
## Girth Height Volume
## 31 20.6 87 77
trees[, ncol(trees)]
## [1] 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 24.2 21.0 21.4 21.3
## [15] 19.1 22.2 33.8 27.4 25.7 24.9 34.5 31.7 36.3 38.3 42.6 55.4 55.7 58.3
## [29] 51.5 51.0 77.0
You can also subset collections of elements. To get the fourth to seventh row of the first and third column:
trees[4:7, c(1, 3)]
## Girth Volume
## 4 10.5 16.4
## 5 10.7 18.8
## 6 10.8 19.7
## 7 11.0 15.6
trees
?Using the line trees[nrow(trees) - 2, 2]
will get you the third last row of the second column. Third last is the last one minus 2 places.
Data frames are collections of vectors, so you can use their names to extract them. You can use your original notation and also $
notation (which is convenient, but not as powerful).
trees[, "Height"]
## [1] 70 65 63 72 81 83 66 75 80 75 79 76 76 69 75 74 85 86 71 64 78 80 74
## [24] 72 77 81 82 80 80 80 87
trees$Height
## [1] 70 65 63 72 81 83 66 75 80 75 79 76 76 69 75 74 85 86 71 64 78 80 74
## [24] 72 77 81 82 80 80 80 87
Often, we want all the rows that meet a certain condition. Say I want all the trees where the height is greater than 82 ft. First, I can produce a TRUE/FALSE variable, then take only the cases where it is true:
greater82 <- trees$Height > 82
greater82[1:5]
## [1] FALSE FALSE FALSE FALSE FALSE
trees[greater82, ]
## Girth Height Volume
## 6 10.8 83 19.7
## 17 12.9 85 33.8
## 18 13.3 86 27.4
## 31 20.6 87 77.0
Using the code trees[trees$Girth < 12.5, ]
, we can see that there are 15 trees with a girth less than 12.5. Press 'p' to see the whole expected output.
But how do I find out about all these functions? There are a few ways of getting help in R:
?print
or help(print)
RSiteSearch("soil")
At the moment, the help files will be fairly hard to get through, but they make more and more sense the more you use R. Some help files will be much simpler than others. If you find the help file impossible to get through, try testing the examples at the bottom of most help files or access them using the examples
function.
sep
argument of read.csv
do?The read.csv
function entry found using ?read.csv
or help(read.csv)
says sep
is the 'field separator character'.
Now we're ready to deal with some real data. We have some soil temperature data (University of Alaska Fairbanks, 2011). The best function to read in CSVs is read.csv
. Let's store that object and call it soiltempdata
.
url <- "https://raw.github.com/sebastian-c/r-mar2013/gh-pages/soiltempdata.csv"
soiltempdata <- read.csv(url)
That looks like it went through nicely. No errors, no complaints.
It's a good idea to look at the first few rows of your data, to make sure it went through ok. We'll use head
to get the first 4 rows of ours.
head(soiltempdata, 4)
## COL. DATA.TYPE X X.1 X.2 X.3 X.4 X.5 X.6
## 1 1 Date Time (AST)
## 2 2 0-cm Depth Soil Temperature (degrees C)
## 3 3 5-cm Depth Soil Temperature (degrees C)
## 4 4 10-cm Depth Soil Temperature (degrees C)
## X.7 X.8 X.9 X.10 X.11 X.12 X.13 X.14
## 1
## 2
## 3
## 4
Blech. That's not what we wanted at all. Most of those columns are empty.
What went wrong? Let's download the data and open it up in Excel or Notepad and look at it (ideally, you should do this before trying to read it in). Click here to link to the data (if you're feeling adventurous, try using the download.file
function to download it. Be sure to use the mode="wb"
option). Save it to your hard drive (put it into My Documents). Then open the CSV file.
Well there's the problem, there's all this other stuff before we get to the actual data.
Set your working directory to where you stored the data. You can use the getwd
function to work out what directory you're in and the setwd
function to set the working directory. Remember to use forward slashes (/
) as the path separator.
getwd()
## [1] "C:/Research/"
setwd("C:/Users/Sebastian/Documents")
getwd()
## [1] "C:/Users/Sebastian/Documents"
Ok, so we had a problem with the data before, let's see if we can read it in properly this time. So the first problem is that we have a few lines before we get to the data. We could remove these manually, but R has a neat trick to sort this problem out, the skip
argument in read.csv
.
There are 16 lines we have to skip (not 17, we want to keep the headers).
Now after all these problems, we can finally read in the data and store it as an object. We can see it actually has columns we want.
soiltempdata <- read.csv("soiltempdata.csv", skip = 16)
head(soiltempdata, 3)
## ID Year Month Day Hour Depth.0.cm Depth.5.cm Depth.10.cm Depth.15.cm
## 1 1 2011 1 1 0 -6.752 -4.940 -5.061 -4.401
## 2 2 2011 1 1 1 -6.727 -4.916 -5.098 -4.378
## 3 3 2011 1 1 2 -6.726 -4.916 -5.098 -4.377
## Depth.20.cm Depth.40.cm Depth.60.cm Depth.80.cm Depth.100.cm
## 1 -3.698 -2.014 0.041 -0.215 7777
## 2 -3.733 -1.993 0.010 -0.195 7777
## 3 -3.733 -1.992 0.010 -0.195 7777
## Depth.120.cm Depth.135.cm Depth.150.cm
## 1 -0.525 -0.577 -0.995
## 2 -0.505 -0.608 -1.028
## 3 -0.504 -0.608 -1.027
Another way of examining objects is the str
function.
str(soiltempdata)
## 'data.frame': 6552 obs. of 17 variables:
## $ ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ Month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Day : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Hour : int 0 1 2 3 4 5 6 7 8 9 ...
## $ Depth.0.cm : num -6.75 -6.73 -6.73 -6.65 -6.67 ...
## $ Depth.5.cm : num -4.94 -4.92 -4.92 -4.91 -4.86 ...
## $ Depth.10.cm : num -5.06 -5.1 -5.1 -5.03 -5.04 ...
## $ Depth.15.cm : num -4.4 -4.38 -4.38 -4.37 -4.38 ...
## $ Depth.20.cm : num -3.7 -3.73 -3.73 -3.72 -3.68 ...
## $ Depth.40.cm : num -2.01 -1.99 -1.99 -2.04 -2.05 ...
## $ Depth.60.cm : num 0.041 0.01 0.01 0.02 0.004 0.04 0.078 0.071 0.055 0.05 ...
## $ Depth.80.cm : num -0.215 -0.195 -0.195 -0.236 -0.2 -0.165 -0.177 -0.184 -0.201 -0.205 ...
## $ Depth.100.cm: int 7777 7777 7777 7777 7777 7777 7777 7777 7777 7777 ...
## $ Depth.120.cm: num -0.525 -0.505 -0.504 -0.546 -0.51 -0.526 -0.487 -0.494 -0.511 -0.515 ...
## $ Depth.135.cm: num -0.577 -0.608 -0.608 -0.598 -0.614 -0.578 -0.591 -0.598 -0.615 -0.619 ...
## $ Depth.150.cm: num -0.995 -1.028 -1.027 -1.017 -0.98 ...
Something still isn't right. Look at the 100cm depth soil temperature from the previous slide. Note that you can use the $
symbol to subset out individual columns.
str(soiltempdata$Depth.100.cm)
## int [1:6552] 7777 7777 7777 7777 7777 7777 7777 7777 7777 7777 ...
There is no way that there is a soil with a temperature of 7777 degrees celsius. So why is that present? Well, if we reopen the data in Excel or Notepad, we'll see there were two important lines:
So it would seem that anything with these values should be a missing value. Turns out R has a trick for that too. Let's reread in the data again (last time, I promise).
If we look at the help file for read.csv
(?read.csv
), we see there's an argument called na.strings
. We have two things we need to be NA, so we combine them like we did a few slides ago: c(6999, 7777)
.
soiltempdata <- read.csv("soiltempdata.csv", skip = 16, na.strings = c(6999,
7777))
str(soiltempdata$Depth.100.cm)
## logi [1:6552] NA NA NA NA NA NA ...
And now it's full of NAs (missing values), as it should be.
A summary of a dataframe's columns can easily be obtained with summary
:
summary(soiltempdata)
## ID Year Month Day
## Min. : 1 Min. :2011 Min. :1.00 Min. : 1.0
## 1st Qu.:1639 1st Qu.:2011 1st Qu.:3.00 1st Qu.: 8.0
## Median :3276 Median :2011 Median :5.00 Median :16.0
## Mean :3276 Mean :2011 Mean :5.02 Mean :15.7
## 3rd Qu.:4914 3rd Qu.:2011 3rd Qu.:7.00 3rd Qu.:23.0
## Max. :6552 Max. :2011 Max. :9.00 Max. :31.0
## Hour Depth.0.cm Depth.5.cm Depth.10.cm
## Min. : 0.00 Min. :-10.87 Min. :-9.90 Min. :-9.88
## 1st Qu.: 5.75 1st Qu.: -8.70 1st Qu.:-8.24 1st Qu.:-8.32
## Median :11.50 Median : -3.79 Median :-2.79 Median :-2.92
## Mean :11.50 Mean : 0.44 Mean : 0.17 Mean :-1.07
## 3rd Qu.:17.25 3rd Qu.: 7.33 3rd Qu.: 6.32 3rd Qu.: 4.49
## Max. :23.00 Max. : 37.21 Max. :32.20 Max. :22.02
## Depth.15.cm Depth.20.cm Depth.40.cm Depth.60.cm
## Min. :-9.52 Min. :-9.33 Min. :-8.81 Min. :-7.787
## 1st Qu.:-8.04 1st Qu.:-7.86 1st Qu.:-7.51 1st Qu.:-6.435
## Median :-2.58 Median :-2.28 Median :-1.82 Median :-1.444
## Mean :-1.51 Mean :-1.87 Mean :-2.75 Mean :-2.846
## 3rd Qu.: 3.90 3rd Qu.: 3.31 3rd Qu.: 1.08 3rd Qu.: 0.199
## Max. :14.76 Max. :10.35 Max. : 4.03 Max. : 1.251
## Depth.80.cm Depth.100.cm Depth.120.cm Depth.135.cm
## Min. :-7.473 Mode:logical Min. :-6.935 Min. :-6.709
## 1st Qu.:-6.212 NA's:6552 1st Qu.:-5.713 1st Qu.:-5.495
## Median :-2.009 Median :-2.482 Median :-2.577
## Mean :-3.170 Mean :-3.230 Mean :-3.205
## 3rd Qu.:-0.449 3rd Qu.:-0.977 3rd Qu.:-1.096
## Max. :-0.122 Max. :-0.487 Max. :-0.574
## Depth.150.cm
## Min. :-6.90
## 1st Qu.:-5.69
## Median :-3.04
## Mean :-3.59
## 3rd Qu.:-1.63
## Max. :-0.98
A simple exploratory analysis could start, for example, with a histogram of the data to observe the distribution of the data. Try using the function hist
with the temperatures at depth 0:
hist(soiltempdata$Depth.0.cm)
(Pretty skewed, huh?)
We can now do some basic plotting. Let's look at the 0cm depth over time. This is easy using the plot
function. We need specify the x variable and the y variable. We'll use the type="l"
argument to make it a line plot (the default has points):
plot(soiltempdata$ID, soiltempdata$Depth.0.cm, type = "l")
What if we have a special interest in temperature midday temperatures? Let's create a histogram of midday temperatures at depth 0:
hist(soiltempdata$Depth.0.cm[soiltempdata$Hour == 12])
Another simple and useful plot is the boxplot. Again, using the soil temperature at depth 0 and the function boxplot
. Note that we can fix up those messy axis labels with the xlab
and ylab
arguments:
boxplot(Depth.0.cm ~ Month, data = soiltempdata, xlab = "Month", ylab = "Temperature")
At the moment the dataset is usable, but is it tidy (Wickham, 2011)? To be tidy, a dataset must have the following qualities:
This data has multiple columns for temperature variables. It would be easier to deal with if all temperatures were in the same column. Ideally, we would have a column for date, a column for depth and a column for temperature. This is often called converted from 'wide' format (many columns, few rows) to 'long' format (few columns, many rows).
One of the best ways to rearrange your data is to use the reshape2
package (Wickham, 2007). You can use the library
or the require
function to load it.
You may get an error:
library(reshape2)
## Error: there is no package called 'reshape2'
If this happens to you, then reshape2
has not yet been installed. Use install.packages("reshape2")
to install it and library(reshape2)
to load it.
Now we have the reshape2
package loaded in, we can use the melt
function to melt out dataframe. See the ?melt.data.frame
page for details. The id.vars
are the variables we don't want to melt. The .name
arguments control the column names of the result:
library(reshape2)
soiltemp_melt <- melt(soiltempdata, id.vars = c("ID", "Year", "Month", "Day",
"Hour"), variable.name = "Depth", value.name = "Temperature")
head(soiltemp_melt)
## ID Year Month Day Hour Depth Temperature
## 1 1 2011 1 1 0 Depth.0.cm -6.752
## 2 2 2011 1 1 1 Depth.0.cm -6.727
## 3 3 2011 1 1 2 Depth.0.cm -6.726
## 4 4 2011 1 1 3 Depth.0.cm -6.651
## 5 5 2011 1 1 4 Depth.0.cm -6.671
## 6 6 2011 1 1 5 Depth.0.cm -6.627
See now we have depth in one column and temperature in another. That's not really good enough though. Out Depth column is still not useful. Using the table
function to get a summary of the values in it:
table(soiltemp_melt$Depth)
##
## Depth.0.cm Depth.5.cm Depth.10.cm Depth.15.cm Depth.20.cm
## 6552 6552 6552 6552 6552
## Depth.40.cm Depth.60.cm Depth.80.cm Depth.100.cm Depth.120.cm
## 6552 6552 6552 6552 6552
## Depth.135.cm Depth.150.cm
## 6552 6552
There are 6552 of each Depth measurement. But the depth measurement isn't a number, it's a character with a number in it. What if we could get that number out and use this column as a numeric (like Temperature is)?
We can use regular expressions to extract the numbers. First, we'll use the regexpr
function to extract the locations of the numbers:
num_matches <- regexpr("[[:digit:]]+", soiltemp_melt$Depth)
str(num_matches)
## atomic [1:78624] 7 7 7 7 7 7 7 7 7 7 ...
## - attr(*, "match.length")= int [1:78624] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "useBytes")= logi TRUE
The string [[:digit:]]+
is a special string. [[:digit:]]
means any number from 0-9 and the +
on the end means 1 or more. So find one or more numbers. The results mean that the match starts at the 7th digit ("Depth.0") and is only 1 character long. If we were to scroll down to the "Depth.150.cm", we'd see that the match also starts at the 7th digit and is 3 characters long.
We can use the regmatches
command to convert from these positions to the actual numbers they match, then assign that back to our Depth column.
depth_nums <- regmatches(soiltemp_melt$Depth, num_matches)
head(depth_nums)
## [1] "0" "0" "0" "0" "0" "0"
Hmm... characters (you can tell by the quotes). We had better convert those to numbers before we replace our depth column).
soiltemp_melt$Depth <- as.numeric(depth_nums)
Now that we have stacked data, it's much easier to make plots. For example, it's easy to make a simple scatterplot of Temperature vs Depth.
plot(Temperature ~ Depth, data = soiltemp_melt)
We can also make better boxplots:
boxplot(Temperature ~ Depth, data = soiltemp_melt, xlab = "Depth", ylab = "Temperature")
Stacking out data has also made our data easier to summarise. We're going to use the plyr
package. In particular, a function called ddply
Let's start with something simple. Let's get the mean temperature for each Depth:
library(plyr)
depth_mean <- ddply(soiltemp_melt, c("Depth"), summarise, meandepth = mean(Temperature,
na.rm = TRUE))
depth_mean
## Depth meandepth
## 1 0 0.4403
## 2 5 0.1725
## 3 10 -1.0736
## 4 15 -1.5077
## 5 20 -1.8686
## 6 40 -2.7486
## 7 60 -2.8459
## 8 80 -3.1703
## 9 100 NaN
## 10 120 -3.2299
## 11 135 -3.2049
## 12 150 -3.5913
We can now plot that easily:
plot(meandepth ~ Depth, data = depth_mean)
What if we want more? Say we want to know the temperature distribution during the day for each depth?
depthday_dist <- ddply(soiltemp_melt, c("Hour", "Depth"), summarise, meandepthday = mean(Temperature))
head(depthday_dist, 2)
## Hour Depth meandepthday
## 1 0 0 -2.688
## 2 0 5 -2.249
That summary is long and hard to plot on a simple graph. How can we make a graph that will plot something like that?
ggplot2
is a package which allows you to build up complex plots from basic elements (Wickham, 2009). First we define the data of the plot in ggplot
, then define the geoms.
library(ggplot2)
ggplot(depthday_dist, aes(x = Hour, y = meandepthday)) + geom_point()
## Warning: Removed 24 rows containing missing values (geom_point).
But how do we tell the points apart? Maybe we could give them different colours and connect the dots by lines? We'll convert Depth to factor so it isn't treated like a number (see what happens if you don't convert it to factor).
ggplot(depthday_dist, aes(x = Hour, y = meandepthday, colour = as.factor(Depth))) +
geom_line() + labs(colour = "Depth")
We can go back to our original melted data and look at all the depths over time if we facet by Depth.
ggplot(soiltemp_melt, aes(x = ID, y = Temperature)) + geom_line() + facet_wrap(~Depth,
ncol = 3)
If we want to check the relation between, for example, the temperature at depth 0 and depth 5, we can use the function lm
temp0 <- soiltemp_melt$Temperature[soiltemp_melt$Depth == 0]
temp5 <- soiltemp_melt$Temperature[soiltemp_melt$Depth == 5]
linear_model <- lm(temp0 ~ temp5) ## temp0 and temp5 stored as subsets
We can access some important information using summary
:
summary(linear_model)
##
## Call:
## lm(formula = temp0 ~ temp5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.764 -0.869 0.085 0.410 6.490
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.24580 0.01453 16.9 <2e-16 ***
## temp5 1.12760 0.00153 739.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.18 on 6550 degrees of freedom
## Multiple R-squared: 0.988, Adjusted R-squared: 0.988
## F-statistic: 5.46e+05 on 1 and 6550 DF, p-value: <2e-16
Or using $
in combination with summary
:
summary(linear_model)$r.squared
## [1] 0.9881
We can produce familiar tables containing sum of squares and pvalues using the anova
function:
anova(linear_model)
## Analysis of Variance Table
##
## Response: temp0
## Df Sum Sq Mean Sq F value Pr(>F)
## temp5 1 755095 755095 546054 <2e-16 ***
## Residuals 6550 9057 1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let's plot the previous data and fitted linear model:
plot(temp5, temp0)
abline(linear_model, col = "blue", lwd = 2)
When you have a bit more R experience, you can make graphs like the following:
We can make your first plot a lot fancier if we want (hover your mouse over it):
Plot soil profiles with the help of package aqp
:
Plot soil profiles locations:
This presentation was entirely made in R using some useful packages such as slidify (Vaidyanathan, 2012). You can look at the source for it on GitHub.
University of Alaska Fairbanks . (2011). South White Hills Met Station (DFM1). http://ine.uaf.edu/werc/projects/foothills/stations/dfm1/historical.html.
Vaidyanathan R (2012). slidify: Generate reproducible html5 slides from R markdown. R package version 0.3.3, http://ramnathv.github.com/slidify/.
Wickham H (2007). “Reshaping Data with the reshape Package.” Journal of Statistical Software, 21(12), pp. 1–20. http://www.jstatsoft.org/v21/i12/.
Wickham H (2009). ggplot2: elegant graphics for data analysis. Springer New York. ISBN 978-0-387-98140-6, http://had.co.nz/ggplot2/book.
Wickham H (2011). “Tidy data.” The American Statistician. http://vita.had.co.nz/papers/tidy-data.html.