Report for a Data Analysis Project

Authors

Author affiliations

  1. Department of Cellular Biology, University of Georgia, Athens, GA, USA.
  2. Department of Plant Pathology, University of Georgia, Athens, GA, USA.

\(*\) These authors contributed equally to this work.

\(\land\) Corresponding author: aw59557@uga.edu

1 Methods

1.1 Data acquisition

The dataset used in this study was provided with the data-analysis project template and can be found in data/raw-data. The dataset was updated to include two new variables, as described below.

1.2 Description of data and data source

Two new variables were added to the dataset and saved in exampledata2.xlsx. The first variable is age, reported as a numeric value >0 or non-applicable (NA) representing the age of each individual in years. The second variable is martial status, reported as either S (Single), M (Married), D (Divorced), W (Widowed), or NA. These variables are documented in the Codebook sheet of the updated dataset. Other variables include (1) height, reported in centimeters as a numeric value >0, (2) weight, reported in kilograms as a numeric value >0, and (3) gender, reported as either male, female, or other.

1.3 Data import and cleaning

Data import and cleaning were performed using scripts located in code/processing-code. This involved looking for obvious outliers (e.g., a weight of 7,000 kg) and removing them from the dataset, among other related processes. The cleaned dataset was saved as processeddata2.rds in the results directory for use in downstream analysis.

1.4 Statistical analysis

Exploratory data analysis was performed using the scripts in code/eda-code/eda-copy.qmd. A boxplot was generated to examine the distrubution of height in relation to marital status. A scatterplot was generated to examine the relationship between weight and age. These figures were saved to the results/figures directory.

Linear models were used to further explore the data. In addition to previously specified models, a third linear model was fit with height as the outcome and age and marital status as predictors. These were saved as resulttable3.rds in the results/able directory.

2 Results

2.1 Analysis

The dataset contained five variables: - Height, reported in centimeters. - Weight, reported in kilograms. - Gender, reported as either male, female, or other. - Age, reported in years. - Marital Status, reported as either single, married, divored, or widowed.

We first conducted exploratory analyses to examine the distributions of age, height, and weight. Figure Figure 1 shows the distribution of height. Figure Figure 2 shows the distribution of weight. Figure Figure 3 shows the distribution of age.

Figure 1: Distribution of Height
Figure 2: Distribution of Weight
Figure 3: Distribution of Age

Table 1 shows a summary of the data.

Table 1: Data summary table. All caption text goes here.

Figure 4 shows a scatterplot figure produced by one of the R scripts to visualize the relationship between height and weight stratified by gender.

Figure 4: Height and weight stratified by gender.

Figure 5 shows the distribution of marital status. Figure 6 shows the boxplot generated in this analysis. Likewise, Figure 7 shows the scatterplot generated in this analysis.

Figure 5: Distribution of Marital Status
Figure 6: Marital Status vs. Height
Figure 7: Age vs. Weight

Below is a summary of all linear model fits.

Table 2: Linear model fit table.
term estimate std.error statistic p.value
(Intercept) 149.6997661 19.7518528 7.5790240 0.0001285
Weight 0.2277371 0.2708841 0.8407177 0.4282860
term estimate std.error statistic p.value
(Intercept) 149.2726967 23.3823360 6.3839942 0.0013962
Weight 0.2623972 0.3512436 0.7470519 0.4886517
GenderM -2.1244913 15.5488953 -0.1366329 0.8966520
GenderO -4.7644739 19.0114155 -0.2506112 0.8120871
term estimate std.error statistic p.value
(Intercept) 205.9240122 47.5127075 4.3340829 0.0123101
Age -0.5094225 0.9626066 -0.5292115 0.6246660
Marital StatusM -30.2686930 26.3026870 -1.1507833 0.3139342
Marital StatusS -25.6790274 30.1016654 -0.8530766 0.4416881
Marital StatusW -2.9908815 35.6700641 -0.0838485 0.9372056