/* Import a dataset from an Excel file. Note, for PROC IMPORT to work correctly, columns of numeric continuous data must not have blanks in the first 8 rows; otherwise, it will treat those columns as character data. One approach to get around this problem is to create 8 dummy rows at the top of an Excel file, containing numbers, and then delete those rows after import into SAS. */ PROC IMPORT OUT= WORK.parasites DATAFILE= "C:\Users\Jason\Desktop\data.xls" /* this line should be modified to correct the filepath */ DBMS=Excel REPLACE; SHEET="data"; RUN; quit; /* Note: There are many other ways to get your data into Excel. You can use the menu-driven feature called the Import Wizard, under File, Import Data. Or, for simple datasets, you can use the "infile" command within a SAS data step.*/ /* The dataset we just imported from Excel is a modified version of the dataset used for an analysis published by Hoeksema & Forde (2008). The goal of the analysis was to test for local adaptation of parasites to hosts. Thus, the response variable of interest is the "effect size" of local adaptation, which is calculated by comparing average parasite infectivity in local versus non-local host-parasite pairings, i.e. SYMINF vs. ALO_INF in the dataset. The calculation of that response variable ("lnRRinf") is performed using a SAS 'data step' below, which is used to manipulate datasets in a variety of ways. */ data parasites_2; /* give the name of the new dataset. this can be the same as the old one, or not */ set parasites; /* give the name of the old dataset being modified */ lnRRinf=(log(syminf/alo_inf)); /* perform a calculation, creating a new column */ /* this next line of code inserts a column with a series of integers, one for each line of data*/ studyid = _n_; /* insert a column of consecutive integers */ if host_growth_rate > 100 then delete; /* use if/then statements, delete rows */ drop syminf alo_inf ; /* drop some variables */ run; /* For analysis of general linear models, with a continuous response variable and one or more continuous and/or categorical predictor variables, SAS has a number of different procedures (PROCs) that can be used. PROC GLM, which uses least-squares estimation of model parameters, can be used for many analyses. PROC MIXED, which uses maximum likelihood estimation of parameters, is more powerful and should be used for analyzing more complicated experimental designs, including repeated-measures and mixed models (with both random and fixed effects). */ /* Here is the PROC MIXED code for analyzing a one-way ANOVA, asking whether local adaptation (lnRRinf) depends on the similarity in gene flow rates between hosts and parasites (a categorical variable called GENFLWSIM, with three levels). The PROC GLM code would be identical. */ proc mixed data=parasites_2 ; /* call the MIXED procedure and identify the dataset to analyze */ class GENFLWSIM ; /* list all variables that should be treated as categorical */ model lnRRinf = GENFLWSIM ; /* response variable = predictor variables */ lsmeans GENFLWSIM / adjust=tukey; /* lsmeans statement obtains least-squares means for all levels of listed categorical variables. 'adjust=tukey' requests tukey hsd post-hoc pairwise comparisons between the means for all levels of GENFLWSIM */ run; /* As an exercise, in the space below, try writing the PROC MIXED or GLM code for analyzing a simple t-test, asking whether local adaptation (lnRRinf) differs between animal versus plant hosts ('hosttype').*/ /* Here is the PROC MIXED code for conducting a simple linear regression, asking whether local adaptation (lnRRinf) is linearly predicted by the growth rate of the host species (host_growth_rate, a continuous variable). Again, the PROC GLM code would identical, for these simple analyses*/ proc mixed data=parasites_2 ; class ; model lnRRinf = host_growth_rate /solution; /* 'solution' asks SAS for slope and intercept */ run; /* Here is the PROC MIXED code for conducting mixed-model ANOVA, with two fixed effects (hosttype and host_growth_rate) and one random effect (paper nested within hosttype).*/ proc mixed data=parasites_2 ; class hosttype PAPER; model lnRRinf = hosttype host_growth_rate / ddfm=satterth outpred=resids; random PAPER(hosttype); /* PAPER is a random effect nested within hosttype */ lsmeans hosttype; run; /* Note: because 'hosttype' appears in the random statement, the F-test for the fixed effect of hosttype (from the model statement) will be constructed with the appropriate denominator, which is PAPER(hosttype). This happens automatically within PROC MIXED, but not within PROC GLM. */ /* Note: ddfm= allows the user to specify different methods for estimating degrees of freedom */ /* The addition of 'outpred=resids' in the model statement in our last analysis generated a dataset called 'resids' that contains the residuals from the model, and a few other useful bits of data. Here is a bit of code for examining a histogram of the residuals */ PROC GCHART data=resids; VBAR resid; RUN; quit; /* Here is the code for graphing local adaptation (lnRRinf) versus host_growth_rate, for each host type separately, using proc gplot */ proc sort data=parasites_2; by hosttype; run; proc gplot data=parasites_2; by hosttype; plot lnRRinf*host_growth_rate; run; quit;