/* Import a dataset from an Excel file. Note, for PROC IMPORT to work correctly, columns of
numeric continuous data must not have blanks in the first 8 rows; otherwise, it
will treat those columns as character data. One approach to get around this problem is
to create 8 dummy rows at the top of an Excel file, containing numbers, and then delete
those rows after import into SAS. */
PROC IMPORT OUT= WORK.parasites
DATAFILE= "C:\Users\Jason\Desktop\data.xls" /* this line should be modified to correct the filepath */
DBMS=Excel REPLACE;
SHEET="data";
RUN;
quit;
/* Note: There are many other ways to get your data into Excel. You can use the menu-driven
feature called the Import Wizard, under File, Import Data. Or, for simple datasets, you can
use the "infile" command within a SAS data step.*/
/* The dataset we just imported from Excel is a modified version of the dataset used for
an analysis published by Hoeksema & Forde (2008).
The goal of the analysis was to test for local adaptation of parasites to hosts.
Thus, the response variable of interest is the "effect size" of local adaptation, which
is calculated by comparing average parasite infectivity in local versus non-local
host-parasite pairings, i.e. SYMINF vs. ALO_INF in the dataset. The calculation of that
response variable ("lnRRinf") is performed using a SAS 'data step' below, which is used to
manipulate datasets in a variety of ways. */
data parasites_2; /* give the name of the new dataset. this can be the same as the old one, or not */
set parasites; /* give the name of the old dataset being modified */
lnRRinf=(log(syminf/alo_inf)); /* perform a calculation, creating a new column */
/* this next line of code inserts a column with a series of integers, one for each line of data*/
studyid = _n_; /* insert a column of consecutive integers */
if host_growth_rate > 100 then delete; /* use if/then statements, delete rows */
drop syminf alo_inf ; /* drop some variables */
run;
/* For analysis of general linear models, with a continuous response variable and
one or more continuous and/or categorical predictor variables, SAS has a number of different
procedures (PROCs) that can be used. PROC GLM, which uses least-squares estimation of
model parameters, can be used for many analyses. PROC MIXED, which uses maximum
likelihood estimation of parameters, is more powerful and should be used for analyzing more
complicated experimental designs, including repeated-measures and mixed models (with both
random and fixed effects). */
/* Here is the PROC MIXED code for analyzing a one-way ANOVA, asking whether
local adaptation (lnRRinf) depends on the similarity in gene flow rates between hosts
and parasites (a categorical variable called GENFLWSIM, with three levels).
The PROC GLM code would be identical. */
proc mixed data=parasites_2 ; /* call the MIXED procedure and identify the dataset to analyze */
class GENFLWSIM ; /* list all variables that should be treated as categorical */
model lnRRinf = GENFLWSIM ; /* response variable = predictor variables */
lsmeans GENFLWSIM / adjust=tukey; /* lsmeans statement obtains least-squares means for
all levels of listed categorical variables. 'adjust=tukey' requests tukey hsd post-hoc
pairwise comparisons between the means for all levels of GENFLWSIM */
run;
/* As an exercise, in the space below, try writing the PROC MIXED or GLM code
for analyzing a simple t-test, asking whether local adaptation (lnRRinf) differs
between animal versus plant hosts ('hosttype').*/
/* Here is the PROC MIXED code for conducting a simple linear regression, asking
whether local adaptation (lnRRinf) is linearly predicted by the growth rate of the
host species (host_growth_rate, a continuous variable).
Again, the PROC GLM code would identical, for these simple analyses*/
proc mixed data=parasites_2 ;
class ;
model lnRRinf = host_growth_rate /solution; /* 'solution' asks SAS for slope and intercept */
run;
/* Here is the PROC MIXED code for conducting mixed-model ANOVA, with two
fixed effects (hosttype and host_growth_rate) and one random effect
(paper nested within hosttype).*/
proc mixed data=parasites_2 ;
class hosttype PAPER;
model lnRRinf = hosttype host_growth_rate / ddfm=satterth outpred=resids;
random PAPER(hosttype); /* PAPER is a random effect nested within hosttype */
lsmeans hosttype;
run;
/* Note: because 'hosttype' appears in the random statement, the F-test for the
fixed effect of hosttype (from the model statement) will be constructed with the
appropriate denominator, which is PAPER(hosttype). This happens automatically within
PROC MIXED, but not within PROC GLM. */
/* Note: ddfm= allows the user to specify different methods for estimating degrees of freedom */
/* The addition of 'outpred=resids' in the model statement in our last analysis
generated a dataset called 'resids' that contains the residuals from the model, and a few other
useful bits of data. Here is a bit of code for examining a histogram of the residuals */
PROC GCHART data=resids;
VBAR resid; RUN; quit;
/* Here is the code for graphing local adaptation (lnRRinf) versus host_growth_rate, for
each host type separately, using proc gplot */
proc sort data=parasites_2;
by hosttype;
run;
proc gplot data=parasites_2;
by hosttype;
plot lnRRinf*host_growth_rate;
run;
quit;