Albert Heijn Non-Dairy Milks

Randomisation of treatment and control groups is the go-to approach to making robust statistical inferences about causality. In experiments with interventions in the physical world, practical considerations often constrain how randomisation can be done, possibly leading to biased results or reducing the sensitivity of the experiments.

When experimenting in the context of time-series data it is possible to use stronger assumptions to mitigate these limitations and improve the sensitivity of experiments without biasing the results.

This piece compares standard A/B testing with two different estimators of treatment effect and shows how using bootstrapping with historical data can be used to…


Building semantic point clouds for the Kaggle Lyft 3D Object Detection for Autonomous Vehicles Competition

Introduction

This post details the approach that Stefano Giomo and I used for our entries into the recent Kaggle Lyft 3D object detection for autonomous vehicles competition (https://www.kaggle.com/c/3d-object-detection-for-autonomous-vehicles).

This competition used data captured by Lyft vehicles equipped with multiple cameras and LIDAR sensors. The vehicles captured hundreds of 20 second scenes on the roads of Palo Alto. The aim of the competition was to place 3D bounding volumes around different classes of objects from these scenes.

We trained a UNet model on a 2D birds eye view representation of the data. The 2D representation was created by a number of preprocessing…


Using eo-learn and fastai to identify crops from multi-spectral remote sensing data

A section of the Orange River, South Africa: colour imagery and NDVI from Sentinel 2 and target masks from Zindi’s Farm Pin Crop Detection Challenge

Introduction

This post describes how I used the eo-learn and fastai libraries to create a machine learning data pipeline that can classify crop types from satellite imagery. I used this pipeline to enter Zindi’s Farm Pin Crop Detection Challenge. I may not have won the contest but I learnt some great techniques for working with remote-sensing data which I detail in this post.

Here are the preprocessing steps I followed:

  1. divided an area of interest into a grid of ‘patches’,
  2. loaded imagery from disk,
  3. masked out cloud cover,
  4. added NDVI and euclidean norm features,
  5. resampled the imagery to regular time intervals,

Simon Grest

Data Scientist at Albert Heijn in the Netherlands

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store