François, Waldner
[UCL]
Jacques, Damien Christophe
[UCL]
Low, Fabian
[Map Tailor Geospatial]
The ground truth data sets required to train supervised classifiers are usually collected as to maximize the number of samples under time, budget and accessibility constraints. Yet, the performance of machine learning classifiers is, among other factors, sensitive to the class proportions of the training set. In this letter, the joint effect of the number of calibration samples and the class proportions on the accuracy was systematically quantified using two state-of-the-art machine learning classifiers (random forests and support vector machines). The analysis was applied in the context of binary cropland classification and focused on two contrasted agricultural landscapes. Results showed that the classifiers were more sensitive to class proportions than to sample size, though sample size had to reach 2,000 pixels before its effect leveled off. Optimal accuracies were obtained when the training class proportions were close to those actually observed on the ground. Then, synthetic minority over-sampling technique (SMOTE) was implemented to artificially regenerate the native class proportions in the training set. This resampling method led to an increase of the accuracy of up to 30%. These results have direct implications for (i) informing data collection strategies and (ii) optimizing classification accuracy. Though derived for cropland mapping, the recommendations are generic to the problem of binary classification.
- ATKINSON P. M., Optimal ground-based sampling for remote sensing investigations: estimating the regional meant, 10.1080/01431169108929672
- Bontemps Sophie, Arias Marcela, Cara Cosmin, Dedieu Gérard, Guzzonato Eric, Hagolle Olivier, Inglada Jordi, Matton Nicolas, Morin David, Popescu Ramona, Rabaute Thierry, Savinaud Mickael, Sepulcre Guadalupe, Valero Silvia, Ahmad Ijaz, Bégué Agnès, Wu Bingfang, de Abelleyra Diego, Diarra Alhousseine, Dupuy Stéphane, French Andrew, ul Hassan Akhtar Ibrar, Kussul Nataliia, Lebourgeois Valentine, Le Page Michel, Newby Terrence, Savin Igor, Verón Santiago, Koetz Benjamin, Defourny Pierre, Building a Data Set over 12 Globally Distributed Sites to Support the Development of Agriculture Monitoring Applications with Sentinel-2, 10.3390/rs71215815
- Breiman Leo, 10.1023/a:1010933404324
- Chawla N. V., Journal of Artificial Intelligence Research, 16, 321 (2002)
- Chen D., Photogrammetric Engineering and Remote Sensing, 68, 1155 (2002)
- Cleveland William S., Devlin Susan J., Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting, 10.1080/01621459.1988.10478639
- Foody G.M., Mathur A., A relative evaluation of multiclass image classification by support vector machines, 10.1109/tgrs.2004.827257
- Foody Giles M., Mathur Ajay, Toward intelligent training of supervised image classifications: directing training data acquisition for SVM classification, 10.1016/j.rse.2004.06.017
- Foody G. M., Arora M. K., An evaluation of some factors affecting the accuracy of classification by an artificial neural network, 10.1080/014311697218764
- Gislason Pall Oskar, Benediktsson Jon Atli, Sveinsson Johannes R., Random Forests for land cover classification, 10.1016/j.patrec.2005.08.011
- Hagolle Olivier, Sylvander Sylvia, Huc Mireille, Claverie Martin, Clesse Dominique, Dechoz Cécile, Lonjou Vincent, Poulain Vincent, SPOT-4 (Take 5): Simulation of Sentinel-2 Time Series on 45 Large Sites, 10.3390/rs70912242
- Huang C., Davis L. S., Townshend J. R. G., An assessment of support vector machines for land cover classification, 10.1080/01431160110040323
- Japkowicz N., Intelligent Data Analysis, 6, 429 (2002)
- Li Congcong, Wang Jie, Wang Lei, Hu Luanyun, Gong Peng, Comparison of Classification Algorithms and Training Sample Sizes in Urban Land Classification with Landsat Thematic Mapper Imagery, 10.3390/rs6020964
- Blagus Rok, Lusa Lara, Class prediction for high-dimensional class-imbalanced data, 10.1186/1471-2105-11-523
- Mather Paul M., Koch Magaly, Computer Processing of Remotely-Sensed Images : An Introduction, ISBN:9780470666517, 10.1002/9780470666517
- Matton Nicolas, Canto Guadalupe, Waldner François, Valero Silvia, Morin David, Inglada Jordi, Arias Marcela, Bontemps Sophie, Koetz Benjamin, Defourny Pierre, An Automated Method for Annual Cropland Mapping along the Season for Various Globally-Distributed Agrosystems Using High Spatial and Temporal Resolution Time Series, 10.3390/rs71013208
- Millard Koreen, Richardson Murray, On the Importance of Training Data Sample Selection in Random Forest Image Classification: A Case Study in Peatland Ecosystem Mapping, 10.3390/rs70708489
- Pal Mahesh, Mather P.M., Assessment of the effectiveness of support vector machines for hyperspectral data, 10.1016/j.future.2003.11.011
- Prati Ronaldo C., Batista Gustavo E. A. P. A., Silva Diego F., Class imbalance revisited: a new experimental setup to assess the performance of treatment methods, 10.1007/s10115-014-0794-3
- VANNIEL T, MCVICAR T, DATT B, On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification, 10.1016/j.rse.2005.08.011
- Vapnik V. N., The Nature of Statistical Learning Theory. Statistics for Engineering and Information Science New York: Springer-Verlag (2000)
- Waldner François, Canto Guadalupe Sepulcre, Defourny Pierre, Automated annual cropland mapping using knowledge-based temporal features, 10.1016/j.isprsjprs.2015.09.013
- Waldner François, De Abelleyra Diego, Verón Santiago R., Zhang Miao, Wu Bingfang, Plotnikov Dmitry, Bartalev Sergey, Lavreniuk Mykola, Skakun Sergii, Kussul Nataliia, Le Maire Guerric, Dupuy Stéphane, Jarvis Ian, Defourny Pierre, Towards a set of agrosystem-specific cropland mapping methods to address the global cropland diversity, 10.1080/01431161.2016.1194545
- Waldner François, Defourny Pierre, Where can pixel counting area estimates meet user-defined accuracy requirements?, 10.1016/j.jag.2017.03.014
- Zhu Zhe, Gallant Alisa L., Woodcock Curtis E., Pengra Bruce, Olofsson Pontus, Loveland Thomas R., Jin Suming, Dahal Devendra, Yang Limin, Auch Roger F., Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative, 10.1016/j.isprsjprs.2016.11.004
Bibliographic reference |
François, Waldner ; Jacques, Damien Christophe ; Low, Fabian. The impact of training class proportions on binary cropland classification. In: The impact of training class proportions on binary cropland classification, Vol. 8, no.12, p. 1122-1131 (06 Aug 2017) |
Permanent URL |
http://hdl.handle.net/2078.1/188581 |