Water Temperature and Juvenile Chinook Salmon Fork Length in The San Francisco Estuary

Investigating whether the mean fork length of juvenile Chinook salmon in the San Francisco Estuary differs at sample sites with colder mean water temperatures compared to warmer mean water temperatures.

Charles Hendrickson
12-02-2021

Research Question

The aim of this project is to investigate whether the mean fork length of juvenile Chinook salmon in the San Francisco Estuary differs at sample sites with colder mean water temperatures compared to warmer mean water temperatures.

Introduction

Chinook salmon Oncorhynchus tshawytscha are anadromous fish, which means they are born in freshwater rivers and migrate as juveniles to saltwater to feed, grow, and mature before returning to freshwater to spawn (NOAA Fisheries). They are vulnerable to many anthropogenic stressors, such as habitat degradation, blocked access to spawning grounds, and loss of genetic diversity from hatcheries. A particularly concerning stressor for Chinook salmon are changes in freshwater thermal regimes due to simplification of the physical structure of aquatic systems, reduced shading from the removal of riparian vegetation, and altered flow regimes from climate change and water diversion. (Richter & Kolmes, 2005). In a study that investigated the effects of three temperature regimes on juvenile Chinook salmon growth, it was found that juveniles reared at an elevated temperature range of 21–24°C experienced significantly decreased growth rates compared with juveniles reared at 13–16°C (Marine & Cech Jr, 2004). Furthermore, studies have found that juvenile Chinook salmon length is positively correlated with the survival and survival was markedly low among the smallest length class (that is, 80 mm to 89 mm) of subyearling Chinook salmon (Brown, Oldenburg & Seaburg et al., 2013). As California’s Chinook salmon populations become increasingly threatened by changes in freshwater thermal regimes and with increasing evidence suggesting that juvenile Chinook Salmon survival could be negatively impacted by decreased fork length, it is necessary to understand how the size of out-migrating juvenile Chinook salmon in the San Francisco Estuary may be impacted by the water temperature of their native rivers.

While previous studies have used controlled laboratory experiments to test the effects of water temperature on Chinook salmon growth, I will use a long-term fish monitoring dataset in my analysis to quantify the impact of mean water temperature on the mean fork length of juvenile Chinook salmon.

Data Description and Collection

I downloaded Interagency Ecological Program (IEP) survey data from the Environmental Data Initiative (EDI) Data Portal.

The IEP survey data contains over four decades (1976-2021) of juvenile fish monitoring data from the San Francisco Estuary, collected by the The United States Fish and Wildlife Service Delta Juvenile Fish Monitoring Program (DJFMP). As part of the IEP that manages the San Francisco Estuary, the DJFMP has tracked the relative abundance and distribution of naturally and hatchery produced juvenile Chinook Salmon of all races as they out-migrate through the Sacramento-San Joaquin Delta using a combination of surface trawls and beach seines. Since 2000, three trawl sites and 58 beach seine sites have been sampled weekly or biweekly within the San Francisco Estuary and lower Sacramento and San Joaquin Rivers.

The DJFMP data set contains columns of the sample site location, date, time, sample method, environmental conditions, water temp, species, fish count, and fork length. Each row represents a sample of one or multiple fish (must have the same fork length) at a distinct site and time. The strengths of this data set are that the sample size is very large and it has ample metadata. The limitations of this data set are that its samples were not taken continuously prior to 1995 so there are gaps in the data. After 1995 samples were taken year-round to the expand the temporal monitoring of fish. Also the data set is not already in tidy format because each row does not represent one fish. For example, the ‘Count’ column contains the count of fish with the same fork length per sample date, which should each be separate rows. The full metadata can be found here.

The variables that I am interested in for my analysis are fork length (mm) and water temperature (degrees Celsius).

Fork length is the length of sampled fish measured from the point of the mouth to the fork of the caudal fin. Fish were measured to the nearest 1 mm before being released. Generally, all fish in each sample that were ≥ 25 mm fork length were identified by species.

Show code
# URL of data set: https://portal.edirepository.org/nis/metadataviewer?packageid=edi.244.8

# Import data and convert it to a data frame
DJFMP_df <- as.data.frame(read.csv(here("../data/1976-2021_DJFMP_beach_seine_fish_and_water_quality_data.csv")))
Show code
# Filter data set for just Chinook salmon (CHN) observations 
DJFMP_CHN_df <- DJFMP_df %>% 
  filter(OrganismCode == 'CHN') %>% 
  select(Location, SampleDate, ForkLength, WaterTemp, MethodCode, Count) %>% 
  uncount(Count) %>%
  mutate(SampleDate = as.Date(SampleDate))

# Drop NA's from WaterTemp column
DJFMP_CHN_df <- DJFMP_CHN_df[!is.na(DJFMP_CHN_df$WaterTemp),]
Show code
# Check for outliers
large_outliers <- DJFMP_CHN_df %>% 
  filter(ForkLength > 250)

small_outliers <- DJFMP_CHN_df %>% 
  filter(ForkLength <= 0)

# Remove these data points because juvenile Chinook salmon cannot have a fork length of zero mm and we only want data on juvenile Chinook salmon that are below 250 mm in length, not adults. 
DJFMP_CHN_df <- DJFMP_CHN_df %>% 
  filter(ForkLength != 0 & ForkLength < 250)
Show code
# This data frame contains each Location's means fork length and water temp.
DJFMP_CHN_df_mean <- DJFMP_CHN_df %>% 
          group_by(Location) %>% 
          summarise(
            mean_fork_length = mean(ForkLength), 
            mean_water_temp = mean(WaterTemp)) 
Show code
# Plot juvenile Chinook salmon fork length over time
ggplot(data = DJFMP_CHN_df, aes(x = SampleDate, y = ForkLength)) +
  geom_point() +
  labs(title = "Fork Length of Juvenile Chinook Salmon in 
San Francisco Estuary (1976-2021)",
       x = "Date",
       y = "Fork Length (mm)") +
  theme_classic()
Fork length of juvenile Chinook salmon in San Francisco Estuary from 1976 to 2021.

Figure 1: Fork length of juvenile Chinook salmon in San Francisco Estuary from 1976 to 2021.

The YSI PRO 2030 Meter was used to measure water temperature before or after each fish sample. Water temperature was recorded to the nearest 0.1˚C.

Show code
# Plot water temperature over time
ggplot(data = DJFMP_CHN_df, aes(x = SampleDate, y = WaterTemp)) +
  geom_point() +
  labs(title = "Water Temperature Across Sample Sites in 
San Francisco Estuary (1976-2021)",
       x = "Date",
       y = "Water Temperature (degrees Celsius)") +
  theme_classic()
Water Temperature at sample sites in San Francisco Estuary from 1976 to 2021.

Figure 2: Water Temperature at sample sites in San Francisco Estuary from 1976 to 2021.

Analysis

I will conduct a hypothesis test to determine if there is a difference in the mean fork length of juvenile Chinook salmon across sites with warmer mean water temperatures compared to sites with colder mean water temperatures in the San Francisco Estuary. Specifically, I will use a linear regression to see if mean water temperature affects mean fork length.

First, I will define my null and alternative hypotheses. Second, I will collect and filter my data for location, sample date, fork length, water temperature, method code, and count. The data must be in tidy format so I will use the uncount() function to turn the ‘Count’ column into duplicate rows so that each row represents one observation. To clean the data I will drop NA values from the water temperature column and exclude any adult Chinook salmon samples by removing fork length values greater than 250mm. I will also remove all fork length values of zero mm because these are faulty samples that do not represent real fish.

Third, I will visualize the general relationship between water temperature and fork length before beginning my analysis on the relationship between mean water temperature and mean fork length. I think it is important to get a understanding of the raw data and show where the majority of the data lies.

Fourth, I will then compute the point estimate by: 1) Calculating the mean water temperature across all samples in the data. 2) Calculating the mean water temperature and mean fork length of each site/location. 3) Filtering for sites with mean water temperature above or below the mean water temperature across all samples (gives us warm water and cold water sites). 4) Taking the difference between the mean fork length of the warm water sites and the mean fork length of cold water sites.

This will give me the difference between the mean fork length of juvenile Chinook salmon that are reared in warmer mean water temperatures compared to colder mean water temperatures.

Fifth, I will conduct a t-test to measure the variability of the sample statistic and ensure that my point estimate is not just a difference in the samples due to random variability. Sixth, I will run a linear regression of mean water temperature on mean fork length. Finally, based on my p-value, I will reject or fail to reject my null hypothesis.

Limitations

A limitation of my analysis is that water temperature is the only independent variable included in the model, which means this analysis is vulnerable to omitted variables bias. It is very likely that there are many different abiotic and biotic factors in addition to water temperature that influence juvenile Chinook salmon fork length.

Hypotheses

Null Hypothesis \[H_{0}: \mu_{ForkLengthCold} - \mu_{ForkLengthWarm} = 0\]

Alternative hypothesis \[H_{A}: \mu_{ForkLengthCold} - \mu_{ForkLengthWarm} \neq 0\]

Visualize data

Scatter plot showing changes in juvenile Chinook salmon fork length (\(y\)-axis) as it relates to water temperature (\(x\)-axis).

Show code
forklength_vs_watertemp <- ggplot(data = DJFMP_CHN_df, 
                                  aes(x = WaterTemp, 
                                      y = ForkLength)) +
  geom_point(size = 1) +
  labs(title = "Relationship Between Juvenile Chinook Salmon Fork Length 
and Water Temperature",
       x = "Water Temperature (°C)",
       y = "Fork Length (mm)") +
  theme_bw()

forklength_vs_watertemp
Juvenile Chinook salmon fork length as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Figure 3: Juvenile Chinook salmon fork length as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Density plot of juvenile Chinook salmon fork length (\(y\)-axis) as it relates to water temperature (\(x\)-axis).

Show code
# Determine where data density is relatively high versus low
forklength_vs_watertemp_alpha <- ggplot(data = DJFMP_CHN_df, 
                                        aes(x = WaterTemp, 
                                            y = ForkLength)) +
  geom_point(alpha=0.1, size=1) +
  labs(title = "Relationship Between Juvenile Chinook Salmon Fork Length
and Water Temperature",
       x = "Water Temperature (°C)",
       y = "Fork Length (mm)") +
  theme_bw() 

forklength_vs_watertemp_alpha
The density of juvenile Chinook salmon fork lengths as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Figure 4: The density of juvenile Chinook salmon fork lengths as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Heat map with geom_bin2d() of juvenile Chinook salmon fork length (\(y\)-axis) as it relates to water temperature (\(x\)-axis).

Show code
forklength_vs_watertemp_density <- ggplot(data = DJFMP_CHN_df, 
                                  aes(x = WaterTemp, 
                                      y = ForkLength)) +
  geom_bin2d(bins = 60) +
  labs(title = "Relationship Between Juvenile Chinook Salmon Fork Length 
and Water Temperature",
       x = "Water Temperature (°C)",
       y = "Fork Length (mm)") +
  theme_bw()

forklength_vs_watertemp_density
Heat map of juvenile Chinook salmon fork length as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Figure 5: Heat map of juvenile Chinook salmon fork length as it relates to water temperature at sample sites in the San Francisco Estuary from 1976 to 2021.

Results

Show code
# Mean water temp of all sites 
mean_water_temp_overall <- mean(DJFMP_CHN_df$WaterTemp)


# Warm water sites
warm_areas <- DJFMP_CHN_df %>% 
          group_by(Location) %>% 
          summarise(ForkLength, mean_water_temp = mean(WaterTemp)) %>% 
  filter(mean_water_temp > mean_water_temp_overall) %>%
  summarise(mean_fork_length = mean(ForkLength))
        

# Remove Veale Tract because it is an outlier
warm_mean <- warm_areas %>% 
  filter(Location != "Veale Tract") %>%
  summarise(mean(mean_fork_length))

warm_mean <- as.numeric(warm_mean)
  

# Cold water sites
cold_areas <- DJFMP_CHN_df %>% 
          group_by(Location) %>% 
          summarise(ForkLength, mean_water_temp = mean(WaterTemp)) %>% 
  filter(mean_water_temp < mean_water_temp_overall) %>%
  summarise(mean_fork_length = mean(ForkLength))


cold_mean <- cold_areas %>%
  summarise(mean(mean_fork_length))

cold_mean <- as.numeric(cold_mean)


# Difference in mean fork lengths for cold water sites vs. warm water sites 
point_est = as.numeric(warm_mean - cold_mean)
print(paste0("The point estimate is ", point_est ,"mm."))
[1] "The point estimate is 2.09782423716582mm."

\[PointEstimate = {MeanForkLengthWarm} - {MeanForkLengthCold}\] The point estimate for the difference in mean fork lengths for sites with warmer mean water temperatures versus colder mean water temperatures is 2.097824mm. However, this value could just be a difference in the samples due to random variability. Thus, I must calculate a measure of variability using a t-test.

Show code
# Just get the sample size (n) for the warm group 
n1 = warm_areas  %>% count()
# Just get the sample size (n) for the cold group
n2 = cold_areas %>% count()
# Get the standard deviation for the warm variable (mean fork length) which we are interested in.
s1 = warm_areas %>% summarize(sd(mean_fork_length, na.rm = TRUE))
# Get the standard deviation for the cold variable (mean fork length) which we are interested in.
s2 = cold_areas %>% summarize(sd(mean_fork_length, na.rm = TRUE))

# Compute the standard error.
# as.numeric makes it a value instead of a data point. 
SE = as.numeric(sqrt(s1^2/n1 + s2^2/n2))
Show code
# And now the test statistic:
zscore = (point_est - 0)/SE
print(paste0("The test statistic (z-score) is ", zscore ,""))
[1] "The test statistic (z-score) is 0.934362800214626"
Show code
# The observed mean fork lengths are 0.9343628 standard deviations away from our null hypothesis. 
Show code
crit_val = qnorm(0.025, lower.tail=FALSE)
Show code
# Now we can construct the 95% confidence interval:
ci_lower = round(point_est - crit_val*SE, 2)
ci_upper = round(point_est + crit_val*SE, 2)

print(paste0("95% probability that [", ci_lower, ", ", ci_upper, "] contains the difference in mean fork lengths of juvenile Chinook salmon for sites with warmer mean water temperatures versus sites with colder mean water temperatures."))
[1] "95% probability that [-2.3, 6.5] contains the difference in mean fork lengths of juvenile Chinook salmon for sites with warmer mean water temperatures versus sites with colder mean water temperatures."

The test statistic (z-score) tells us that the observed difference in mean fork lengths of juvenile Chinook salmon for sites with warmer mean water temperatures versus sites with colder mean water temperatures is 0.9343628 standard deviations above the null hypothesis of zero difference.

The 95% confidence interval tells us that there is a 95% probability that [-2.3, 6.5] contains the difference in mean fork lengths of juvenile Chinook salmon for sites with warmer mean water temperatures versus sites with colder mean water temperatures.

Show code
library(modelr)

# regression
model_1 <- lm(mean_fork_length ~ mean_water_temp, data = DJFMP_CHN_df_mean)

# create predictions and residuals
predictions <- DJFMP_CHN_df_mean %>% add_predictions(model_1) %>%
  mutate(residuals = mean_fork_length - pred)

# histogram
ggplot(data = predictions) + geom_histogram(aes(residuals), bins=25) +
  labs(title = "Residual Distribution") +
  theme_classic()
The distribution of residuals for water temperature on juvenile Chinook salmon fork length.

Figure 6: The distribution of residuals for water temperature on juvenile Chinook salmon fork length.

The residuals appear to be roughly mean zero based on the histogram above, but there is a long right tail due to outliers in the residuals distribution, so it is definitely not a perfectly normal distribution.

Next I will quantify the probability that the sample statistic differs from the null by using the lm() command to estimate the following simple linear regression:

\[ \text{mean fork length}_i = \beta_0 + \beta_1 \text{mean water temperature}_i + \varepsilon_i \]

Show code
lm(mean_fork_length ~ mean_water_temp, data = DJFMP_CHN_df_mean) %>% 
  summary()%>% 
  xtable() %>% 
  kable(caption = "Table 1: Linear Regression of Mean Water Temperature on 
        Mean Fork Length of Juvenile Chinook Salmon")
Table 1: Table 1: Linear Regression of Mean Water Temperature on Mean Fork Length of Juvenile Chinook Salmon
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11.968201 10.3452215 -1.156882 0.2519831
mean_water_temp 4.944056 0.8334126 5.932303 0.0000002

The p-value is much lower than 0.05, so we can reject the null hypothesis that there is no effect of water temperature on mean fork length.

Intercept: On average, the predicted mean fork length of a juvenile Chinook salmon raised in zero degree Celsius water is -11.968201 (mm).

Coefficient on water temperature (WaterTemp): On average, the predicted mean fork length of juvenile Chinook salmon increases by 4.944056 (mm) for every one degree Celsius increase in water temperature.

R-squared: Our coefficient of determination (\(R^2\)) was 0.02533, which tells us that 2.53% of the variation in the mean fork length of juvenile Chinook salmon is explained by the mean water temperature at the site. This is expected because there are many factors in addition to water temperature that play a role in determining the mean fork length of juvenile Chinook salmon.

Show code
forklength_vs_watertemp <- ggplot(data = DJFMP_CHN_df_mean, 
                                  aes(x = mean_water_temp, 
                                      y = mean_fork_length)) +
  geom_point() +
  geom_smooth(method = 'lm', formula= y~x, se=FALSE, size=1) +
  labs(title = "Relationship Between Juvenile Chinook Salmon Mean Fork Length
and Mean Water Temperature",
       x = "Mean Water Temperature (°C) ",
       y = "Mean Fork Length (mm)") +
  theme_bw()

forklength_vs_watertemp
A linear regression of the relationship between mean water temperature and mean fork length of juvenile Chinook salmon in the San Francisco Estuary from 1976 to 2021.

Figure 7: A linear regression of the relationship between mean water temperature and mean fork length of juvenile Chinook salmon in the San Francisco Estuary from 1976 to 2021.

From our linear regression line, we can see the increase in the mean fork length of juvenile Chinook salmon as the mean water temperature increases.

Since \(p-value = 1.675e-07 < 0.05\) we reject the null that there is no difference in the mean fork length of juvenile Chinook salmon at sites with colder mean water temperatures compared to sites with warmer mean water temperatures. We can say there is a statistically significant difference (at the 5% significance level) in the mean fork length of juvenile Chinook salmon across sites with colder mean water temperatures compared to sites with warmer mean water temperatures.

Conclusion and Next Steps

Based on the findings of my analysis, the mean fork length of juvenile Chinook salmon in the San Francisco Estuary does significantly differ at sample sites with colder mean water temperatures compared to warmer mean water temperatures.

In future analyses, I would investigate the difference the mean fork length of juvenile Chinook salmon at sites with high nutrient levels versus sites with low nutrient levels. Nutrient levels play a key role in determining the growth rate of juvenile Chinook salmon because insects and other food sources are often more abundant in rivers with warmer water water temperatures than colder water temperatures. Additionally, using a multiple linear regression to investigate the role that nutrient levels and water temperature have on the mean fork length of juvenile Chinook salmon could provide valuable information for the conservation and management of this threatened species.

Github

The repository containing all of the code and data for this analysis can be found here.

References

Interagency Ecological Program (IEP), R. McKenzie, J. Speegle, A. Nanninga, J.R. Cook, and J. Hagen. 2021. Interagency Ecological Program: Over four decades of juvenile fish monitoring data from the San Francisco Estuary, collected by the Delta Juvenile Fish Monitoring Program, 1976-2021 ver 8. Environmental Data Initiative. https://doi.org/10.6073/pasta/8dfe5eac4ecf157b7b91ced772aa214a (Accessed 2021-11-30).

Ann Richter & Steven A. Kolmes (2005) Maximum Temperature Limits for Chinook, Coho, and Chum Salmon, and Steelhead Trout in the Pacific Northwest, Reviews in Fisheries Science, 13:1, 23-49, DOI: 10.1080/10641260590885861

Keith R. Marine & Joseph J. Cech Jr (2004) Effects of High Water Temperature on Growth, Smoltification, and Predator Avoidance in Juvenile Sacramento RiverChinook Salmon, North American Journal of Fisheries Management, 24:1, 198-210, DOI: 10.1577/M02-142

Brown, R.S., Oldenburg, E.W., Seaburg, A.G. et al. Survival of seaward-migrating PIT and acoustic-tagged juvenile Chinook salmon in the Snake and Columbia Rivers: an evaluation of length-specific tagging effects. Anim Biotelemetry 1, 8 (2013). https://doi.org/10.1186/2050-3385-1-8

NOAA Fisheries, Chinook Salmon (2021). https://www.fisheries.noaa.gov/species/chinook-salmon-protected#overview