2
$\begingroup$

I want to model waste generation in a city using data from the containers. I have the location and fill levels of each container over time. My idea is to use the average amount of waste in each container a day and create some sort of surface over the city that would tell the average waste a container would contain if it was placed in those coordinates. My first thought was to interpolate a surface using the points in a 3d space (latitude, longitude, average) but I don't think this would give reliable results.

I'm not sure whether this is a good approach or not. I would appreciate recommendations on what type of models to use for this and if there is a better way to do it. My idea is to code it using python so any tips on this are also appreciated.

$\endgroup$
2
  • $\begingroup$ Interesting question. You have a time series component in here, and likely the fill levels depend on the day of week and possibly the hour of day, depending on what time granularity your data has. You would also need to model that the bins are emptied at some point in time. Finally, your analysis goal kind of begs the question: won't placing your bins next to existing ones cannibalize them, so that the fill level afterwards are lower in all? Placing a new bin should not create new trash. $\endgroup$ Commented May 14 at 8:23
  • $\begingroup$ @StephanKolassa The idea is to model waste generation to know in a way how people throw the trash that's why i was thinking of considering the average amount of waste before the bins are emptied for example. It is true that fill levels will depend on the day of the week but I don't think it will affect this case. Finally, I'm not thinking of placing new bins but changing the location of the existing ones to optimize them. With this it is also possible that fewer bins are possible to collect the same amount of waste and changing the position could improve the collection routes. $\endgroup$
    – pato
    Commented May 14 at 8:33

2 Answers 2

2
$\begingroup$

What you can do is similar to curve-fitting of an uknown distribution. For one-dimensional data $x_1,...,x_n$, you can take the interval $[a,b]$, containing the max and min of the data, and cut it up into $k$ sub-intervals. Then count how frequently the data falls into each sub-interval. After you appropriately rescale this histogram you will get a PDF (probability density function) for the data.

Often the PDF curve will have a lot of sharp corners, but you can fix this with convolution. More precisely, you take each point in the curve and replace it the average of its neighboring points. This procedure will produce a smoother looking PDF curve.

What you are doing for your surface is similar. Except you have two dimensional data. You would contain all that data into a large rectangle $[a,b]\times [c,d]$, cut up the rectangle into smaller sub-rectangles, and draw a histograph plot. Renormalize this plot to get a surface and then smoothen out the surface using convolution.

I would suggest to first write your code for the 1-dim curve first. It is fairly short. You can post it here so we can improve it for you and then it will be a natural generalization to the 2-dim curve by simply adding an extra for-loop.

$\endgroup$
0
$\begingroup$

It sounds like what you have is generally referred to as point-referenced spatial data.

The default approach to this problem is Gaussian process regression, or kriging - the idea is to interpolate the spatial region using a Gaussian process. In this case you would be estimating the average waste at any given point in your domain.

For an introduction to these methods I’d recommend either Cressie and Wikle, or Banerjee for a Bayesian approach.

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.