Reflecting on Research: Data – when is enough enough
My research focuses on a case study, a single river with a catchment of +100km2 that is the tributary of a much larger catchment. The landscape is, in description, on the borders of the lowlands and the uplands, hilly but not mountainous, diverse, predominantly agricultural rural and wet.
It is wet both in weather and in leakage, the hills leak. (not a geography term. probably.)
My project is combining the social and physical sciences by incorporating landmanager/owner perspectives and expertise with traditional GIS (Geographic Information Systems) and computer modelling, to explore tree planting for NFM as a landuse change.
I have between one year to 18months for data collection. This seems like a long time, but it really isn't
My project breaks down into three key areas
- Explore social perceptions, preference and expertise to co-create understanding and knowledge of the catchment
- Model a known flood event and develop sceanrios of alternative landuse
- Analyse and evaluate the case study with and including participant evaluation
Each one of these stages could involve an epic amount of data collection.
So when is enough enough.
I spent all summer walking farms, gathering data and most importantly, listening. But when you are considering peoples opinions, their perceptions and preferences, how many people do you talk to? When everyone is of equal value shouldn’t you talk to everyone?! If that’s not possible (which it isn’t, not everyone would want to talk to me I’m sure, let alone the time limits),
How do you decide who to target?
How do you decide when you have done enough?
Actually the first of those two questions is the more easily tackled, there’s vast amounts of literature supporting how to select participants; random selection, snowball selection etc. in reality it often comes down to who you can contact, who answers the phone or who happens to be in when you knock on the door. I used a number of different methods to try and speak to a diverse, roughly representative range of people.
The second question is much much harder. Strictly speaking I think I have enough qualitative data to have been able to make conclusions from what I’ve been told. Officially I have ‘finished’ this initial aspect of data collection, but there are areas of the catchment I’d really like to know more about, and people and landscapes of interest I think might be really important. I’m not going to not speak to these people if I get the chance, but I also have to move on to the next stage of the project; computer models do not run themselves. Well, yes, they do, but I have to tell it what to do first. Actually first I have to work out what I want it to do, then work out how to tell it to do that. Urgh.
As I accept the situation and move on I find that I am still tackling the same question. How much data is enough? I am using a ‘physically based spatially distributed’ model; more simply it uses information about the physical characteristics and where things are in relation to each other to work out what’s happening. The alternative is to use a conceptual lumped model, which I’m not going to explain, but it’s more ‘pure maths’ and doesn’t allow me to explore landuse change in quite the same detail. The ‘conceptual’ bit is a bit of a red herring as the model I’m using is obviously conceptual; I am not, for example, actually going to fill my computer with soil when I do the soil input. I do however, have to put in a value for the soil quality (quantity, depth etc). Here resides the difficulty, I am using physical data to represent the catchment and to an extent the finer the detail and the data the more accurate the model… sort of
But there’s a line where the quality and quantity of the data runs out, and by running a more complex model I’m more likely to create errors and uncertainties which I wouldn’t have had with the simpler model. So I have to ask not only ‘how much data is enough’ but also ‘how much is to much?’!
I am still tackling this one – the endless problem with a PhD is that no one else really knows what you’re doing, so no one can really tell you what you do or don’t need until you need it.
For now, having access to the data when I need it is more important than not having it, or having it and not needing it.
So I’ve ordered a pair of waders and I’m either going to sit in front of my supervisor and cry until he tells me what to do or (more likely) I’m going to get some help and go and walk in the river with some kit and collect some data that I may or may not need.
This is normal right?