Predicting the nationwide outmigration timing of Atlantic salmon (Salmo salar) smolts along 12 degrees of latitude in Norway

Accurate predictions about transition timing of salmon smolts between freshwater and marine environments are key to effective management. We aimed to use available data on Atlantic salmon smolt migration to predict the emigration timing in rivers throughout Norway.


| INTRODUC TI ON
Atlantic salmon born in freshwater move to the ocean following smoltification to exploit the productivity of marine waters (Klemetsen et al., 2003;McCormick et al., 1998). The post-smolt life stage when salmon are exiting the rivers and transitioning to life in the marine environment is a critical bottleneck in their survival (Lothian et al., 2018;Stich et al., 2015;Thorstad et al., 2012). The timing of migrations has been adapted by evolutionary processes to match optimal conditions for survival (Lennox et al., 2016); for Atlantic salmon smolts, this window of opportunity is narrow in order to match physiological preparedness for migration with environmental conditions (McCormick et al., 1998). Optimal conditions are not directly observable by animals; therefore, they rely on secondary cues to synchronize their migration timing (e.g. Duston & Saunders, 1990). Deviations from the optimum time frame can yield high mortality when anadromous species enter the ocean to encounter suboptimal conditions (Scheuerell et al., 2009).
For example, Hansen and Jonsson (1989) released smolts from the River Imsa throughout the year and observed returning adults came from groups released in the spring, illustrating the sensitive time window that smolts must initiate their seaward migration for their longterm success.
Evidently, there is strong pressure for Atlantic salmon smolts to migrate at appropriate times (Otero et al., 2014). Anthropogenic effects that have negative impacts on post-smolts during this window are of great concern to salmon conservation (McCormick et al., 1998;Thorstad et al., 2012). Consequently, management must have the appropriate tools available to them in order to predict the timing of the smolt migration to evaluate how such effects may overlap with the smolt migration and give advice on when mitigation efforts are most effective. Yet, the most frequently used tool for this is presently the designation of index rivers, nearby watersheds that are monitored and used to predict the timing of the smolt migration in all other proximate systems (e.g. Johnsen et al., 2021;Kristoffersen et al., 2018). Models have been developed to predict the timing of smolt migrations in single rivers and generally show that temperature and discharge are important to explaining the timing of smolt migration in rivers (Hansen & Jonsson, 1989;Whalen et al., 1999).
Models across several rivers that attempt to predict general patterns of smolt migration have also been made, but with variable overarching goals. For example, Hvidsten et al., (1998) used data from five rivers to postulate that salmon migrate to sea when marine temperature exceeds 8°C. More recently, Otero et al., (2014) synthesized data throughout the range of Atlantic salmon showing a temporal shift in the timing of outmigration, and a correlation with both sea surface temperature and freshwater temperature.
A model that can be used to accurately predict outmigration timing on relevant management scales (such as aquaculture production zones in Norway) would clearly be of great importance for the conservation of Atlantic salmon. Generating predictions for the timing of smolt migration can provide a tool for monitoring the status of Atlantic salmon rivers and observing changes across time. An effective predictive model could be used to make predictions for rivers that are not monitored due to access or financing challenges (i.e. out-of-sample). We collated smolt migration data from Norwegian Atlantic salmon rivers to generate a model of the spatial and environmental features predicting the timing of smolt migration for years between 1984 and 2018. We used regression modelling to test the relationships and validate the predictive power of the model, generate out-of-sample predictions and compare the predicted timings to estimates used by national management councils.

| Smolt migration data
The goal of our study was to collate available smolt migration data from Norway, in an attempt to make a predictive model of the timing of smolt migration in this area. Data were extracted from three sources: (a) published scientific articles, (b) Norwegian reports, and (c) unpublished data available from the authors' research institutions. Data were updated from previous compilations using the same methodology (Ugedal et al., 2014). The database does not contain daily counts but summarizes the timing of smolt emigration by percentiles, recording the dates of 25% passage, 50% passage and 75% passage (Table 1).
Smolt migration was monitored in 348 river years, comprising 47 rivers between 58.02 and 70.50 degrees latitude from 1984 to 2018 ( Figure 1). Monitoring was conducted using different methods of observation: traps (N = 252 river years), video counting (N = 84 river years) and tagging (N = 12 river years). Note Eio and Vigda are the only two rivers that had multiple counting methods, so they are counted multiple times. The placement of these monitoring tools was not consistent among rivers nor was the timing of deployment standardized.

| River morphology data
Morphological data from the river catchments were downloaded from Nevina (http://nevina.nve.no/). This includes elevation data from the catchment, land composition (e.g. per cent of catchment covered by agriculture, forest, lake and urban areas) and air temperature throughout the year (summer, winter, July, August temperatures). In addition, modelled average discharge, average rainfall and average air temperature were extracted from each of the catchments from the same database.

| Annual environmental data
Seasonal water temperature measurement data were not available for the majority of rivers. However, air temperature data were TA B L E 1 Summary of modelled variables including those in the full and reduced models. Rivers in Norway are characterized by ID numbers assigned by the national resource authority, which are included for reference. Observed dates of 25% outmigration (±SD) from the data are included along with the number of years of data, the counting method, river coordinates, elevation and grade, whether the river has lakes, and the mean air temperatures and river flows extracted for modelling (see below for details) each individual year to explain the air temperature recorded by the longitude, latitude and altitude of the station using the gam function in the mgcv package (Wood, 2017). The gam models were then carried forward to predict the air temperatures for each river in each year using coordinates of the river mouth, and average grade of the river using the predict.gam function. Air temperature for each river in each year was summarized as the average between 1 January and 31 March. Modelled water discharge data were available for each river from the NorKyst800 model (Albretsen et al., 2011). The Norwegian river discharges were modelled by the NVE (Norwegian Water Resources and Energy Directorate) using a distributed version of the HBV model with 1 km horizontal resolution (Beldring et al., 2013;Huang et al., 2019). We summarized water discharge for each river in each year by extracting the first day of the year when the flow first hit 10 and 25% of the maximum flow from 1 March to 18 July, which was considered to be the maximal likely window for onset of smolt migration. We used different temporal windows for temperature and flow because temperature should control physiological readiness (proximate cause) and flow should drive the exact timing of migration (ultimate cause). Both flow extractions gave very similar model outputs.

| Data analysis
We started with a large dataset including physical river characteristics, land use, geographic and climate variables. We initially considered a principal component analysis (PCA) to summarize the data but opted to manually select variables of interest because (a) many were highly correlated, and (b) we wanted to understand the relationship between smolt run timing and specific variables and not a hybrid variable produced by PCA. Therefore, we discarded land use variables and several temperature variables that were highly correlated with the average air temperature variable (1 January to 31 March) that we retained.
We constructed generalized additive models using the gam function in the R package mgcv (Wood, 2017) to model the influence of longitude and latitude at the river mouth, mean air temperature in the first quarter of the year and first increase in spring discharge (25% as explained above). The independent variable of interest was the timing of 25% outmigration from the river, which was selected because it was more complete than the 50%, 75% and 100% estimates. However, these were all highly correlated so this selection should not influence the interpretations from the model, except that had we modelled 50% emigration the estimate would be shifted later in the year. We constructed two models, one including more variables and one with simpler, more accessible data. The first model included spatial variables latitude and longitude as a combined smoother (Pedersen et al., 2019) and linear effects for mean air temperature, date of flow being 10% of the annual maximum for the first time, the river height and gradient, whether the river included lakes in the anadromous section and the sampling method (video, PIT tagging, acoustic tagging, smolt screw trap, wolf trap), in addition to a random effects of river and year to account for measurements coming from the same river and in the same year (specified as a smoother using the argument bs= "re"). The second model was simpler, with only the smooth terms for latitude and longitude together, linear effects of temperature and flow, counting method and a random intercept for each river and year. Predictions were generated using the predict.gam function (Wood, 2017).
Predictions were generated in sample and out of sample to estimate the run timing in rivers where we did not have estimates of the date of 25% smolt migration. For the out-of-sample predictions, we set the random effects of year and river to zero. A mixedeffects model of the in-sample predictions against the known values for the 25% smolt migration date was run with river as a random effect with the lme function in the nlme package (Pinheiro et al., 2019). Predictions were extracted for 2018 and compared with the national estimate of 25% smolt outmigration using a linear model with the lm function. Model performance was assessed by k-fold cross-validation by splitting the dataset into ten groups, training the simplified GAM model on nine subsets and using the tenth subset as a test set to compare predictions to the true values, rotating through all ten combinations of models and testing the correlation of predictions generated from the 10 models against the known values.

| Data visualization
Data were plotted using ggplot2 (Wickham, 2016). The map was accessed from the cshapes R package with the cshp function (Weidmann & Gleditsch, 2016).

| RE SULTS
We used 348 observations of the date of 25% smolt emigration from 47 rivers from 1984 to 2018 to derive our model (Figure 1).
Observations ranged from 24 March (Vikja River, 2014) to 4 July (Alta River, 2005). The full and the simplified model fit similarly, but the simpler model was better (ΔAIC = 2.7). We therefore proceeded with the simplified GAM model that included the random effect for river, and smoothers for latitude and longitude together, mean air temperature in the first quarter of the year and the first date of 25% flow. The model had a strong fit to the data (R 2 = .86; deviance explained = 88%). The smoother on longitude and latitude was highly significant (F = 6.64, p < .01). Air temperature in the first quarter was significant (F = −4.81, p < .01), but flow was not significant (F = −0.93, p = .35). The counting method was also significant, with estimates from tagging studies yielding earlier results than video counting (t = 2.74, p = .01) but not trap catches (t = 1.66, p = .10). Predictions were generated in-sample to determine how well the predictions from the model fit the known data, showing a significant correlation with the observations (t = 15.37, p < .01; Figure 2).
Out-of-sample predictions were then generated to estimate outmigration timing in river years throughout Norway (N = 1,753) based on the longitude, latitude, air temperature and flow in a given year; the capture method was set to the factor of trap (Figure 3). There was a strong relationship between GAM-predicted timing and the national estimate of outmigration timing for 2018 (t = 46.45, p < .01, R 2 = 84%); the model fit (intercept 39.73, slope 0.78) suggested that the national estimates were generally earlier than the GAM predictions in southerly latitudes, but was later in the north (Figure 4). Tenfold cross-validation of the model predictions against observations revealed an R 2 of 83%.

| D ISCUSS I ON
We constructed a model that effectively described the variance associated with the timing of 25% smolt migration out of Norwegian rivers. The model was robust, showing good predictive accuracy, based on the spatial and environmental model inputs. We derived air temperature from predictions of a separate generalized additive model that interpolated temperatures based on weather stations throughout Norway, which was a strong predictor of smolt migration timing. These results are highly relevant to understanding Atlantic salmon ecology and managing this culturally and ecologically important migration that is threatened by human development and climate change (Otero et al., 2014).
Estimated air tempharerature was a significant predictor of the timing of smolt migration from the rivers that we modelled in Norway.
Spatial effects were very strong predictors of the outmigration timing, and given that temperature is correlated with latitude, this may have influenced the interpretation of the effect of temperature in F I G U R E 2 (a) Observed and predicted dates of 25% smolt emigration for each river, coded by colour. Coloured lines represent the fit for individual rivers between observed and predicted values. Temperature controls physiological rates and development of animals, particularly ectothermic species such as most fishes (Brett, 1971;Fry, 1971). In Atlantic salmon, juvenile growth in rivers is controlled by water temperature (Elliott & Hurley, 1997). There is also a strong relationship between the smolt age, that is the time it takes for a juvenile salmon to develop to the stage at which it initiates seaward migration, with photoperiod and water temperature (Metcalfe & Thorpe, 1990). Temperature is believed to have a role in the timing of smolt migrations (Zydlewski et al., 2005). Jonsson and Ruud-Hansen (1985) found that water temperature was a significant predictor of smolt run timing in the River Imsa, measuring temperature from 9 April to 16 May but suggesting that the models were relatively insensitive to the period of time-temperature data were collected. We used air temperature rather than water temperature because it was more accessible from historic records. In addition, air temperature can easily be accessed for any river in Norway or estimated by interpolation using the same modelling approach as we implemented for our study making it a versatile tool (Benestad et al., 2019).

F I G U R E 3
Out-of-sample (i.e. predictions made for rivers not included in the model) predictions for the mean timing of outmigration in 401 Norwegian rivers. The generalized additive model from which the predictions were derived included mean air temperature in the first quarter of the year, water flow, longitude and latitude, and monitoring method (set to the factor level "trap"). Predictions are shown for 2018 temperature/flow values. The random intercepts of year and river were set to zero to avoid adjustment for these effects

Latitude
We had access to data from many rivers collected across multiple years that used different counting methods to monitor the smolt run. Standardizing these methods would be ideal to construct a robust model, for example, using the same monitoring methods with the same installation dates across years in different rivers.  . Tagging also then applies to a relatively small proportion of the total population, which can bias results. Tagged fish may also exit differently than untagged fish, potentially biasing PIT and acoustic methods (Hulbak et al., in review). High flows can flood nets and traps and make them ineffective, which can also affect estimates. Placement of a trap, weir or antenna also affects the timing estimation; all are most effective at narrow points to increase detection probability but should also be as close to sea as possible for a representative estimate of timing.
We expected that water discharge would influence the outmigration timing expressed by the Atlantic salmon smolts; however, this did not have a significant effect on outmigration in our model.
Whereas water temperature affects salmon development by acting directly upon the physiology of the fish such that accumulated thermal units across time presumably influences the preparedness of a fish to migrate, water discharge is more likely a threshold cue such that peak flows will provide a stimulus to the fish to migrate downriver (Urke et al., 2013). Snow melts later in cold, northern rivers, and therefore, the peak floods are expected to be later on average than in the south; yet, we observed no correlation between the average temperature and the peak discharge variables that we included in our model. This might be surprising given that many studies have shown a very clear response to either discharge or increase in discharge when studying outmigration of salmon smolts (Hvidsten et al., 1995). However, it is important to note that 25% outmigration may not change much even though the timing of first migration may change a lot. This may be because not all of the population migrates during the first increase in discharge. The proportion not migrating is likely controlled by the developmental rate, which is influenced by temperature during the winter season preceding migration; this temperature development probably transpires in a more nonlinear fashion than we could model and will require some experimental approaches to derive empirically in the future. Consequently, the median or 25% outmigration time may not be as sensitive to these changes although individual fish may respond very clearly to discharge triggers.
Predictive models are important for both improving the fundamental understanding of ecological processes and making effective evidence-based management decisions. Perhaps the best example of application of a predictive model is within the management of effects of salmon lice (Lepeophtheirus salmonis Krøyer 1837) on the survival of out-migrating salmon smolts. Models that attempt to estimate the salmon lice-induced mortality on out-migrating postsmolts have all identified the timing of outmigration of the modelled smolts as one of the most sensitive parameters (Johnsen et al., 2021;Kristoffersen et al., 2018); this is because sea lice infestation pressure is strongly temperature-dependent and increases dramatically throughout the season in areas with high fish farming activity. This is partially due to the density dependence in the sea lice population on fish farms (Jansen et al., 2012) and because the development rates in critical life stages of salmon lice increase nonlinearly with temperature (Stien et al., 2005). Consequently, modelled parasite-induced mortality can vary from negligible to above 30% if smolts migrate two weeks later. In Norway, these models have become even more critical for management and the salmon farming industry as a new management system that relates allowable biomass in fish farms to estimated parasite-induced mortality has been implemented . Timing of outmigration, which is such an essential part of this equation, has heretofore been based on a subjective evaluation on outmigration time based on data from nearby sentinel rivers.
Our model is a step forward in making an objective evaluation of the outmigration time that can be used in models that are used in management. It is important to note, however, that the differences between the new predictive model and the national evaluation were not very large; the difference was greatest in the southern rivers, but our model can be parameterized with temperatures and flow in a given year to yield more accurate estimates.

| CON CLUS IONS
Managing Atlantic salmon populations is a challenge for many nations, regions and municipalities. In Norway, there are hundreds of salmon-producing rivers that cannot all be monitored effectively. Yet, monitoring is increasingly important given that a progressively destabilizing climate means that changes to temperature regimes and animal populations can be expected to occur rapidly. Models using local geographic and environmental information can be used to better understand macroecological processes such as migration and manage threats to salmon populations more proactively. Salmon provides a great case study for other migratory species for which migration timing is critical to management; generalized additive models accounting for the spatial variation seem to be strong tools for generating predictions that can inform management. Our results show the potential for using accessible parameters to estimate the timing of smolt migrations in rivers, a tool that has the capacity to make contributions to the management of industry particularly through estimation of sea lice burdens on different populations. Our model is refinable with additional data and standardization of collection methods, and more data would allow for improved model calibration using training and test sets. Indeed, the model could be updated with additional data from other countries to expand our understanding of smolt run timing. Nonetheless, this model with available data represents an improvement to the methodologies used by fisheries management agencies to estimate run timing of Atlantic salmon smolts.

ACK N OWLED G EM ENTS
The authors thank Gavin Simpson for the helpful stack exchange post detailing how to set random effects to zero for predictions with GAM models (https://stats.stack excha nge.com/quest ions/13110 6/predi cting -with-rando m-effec ts-in-mgcv-gam). This work was carried out with financing and support from the Ministry of Trade, Industry and Fisheries.

CO N FLI C T O F I NTE R E S T
The authors wish for readers to be aware that none of their interests are in conflict with the content of the manuscript.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13285.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data are available in the Dryad Repository: https://doi.org/10.5061/ dryad.p2ngf 1vq9. The file is a post-processed (i.e. cleaned and joined with relevant metadata) spreadsheet. The spreadsheet includes river-specific smolt migration estimates joined with river metadata, temperature and flow as used in the models.