SynapseML version
1.0.13
System information
- Language version python 3.9
Describe the problem
From glancing at the LightGBM code, I believe there is a conflict between the numBatches parameter and earlyStoppingRound. If you set both these params, I think that you may hit early stopping in the first batch and then never train on the remaining batches.
This would be very suboptimal. earlyStoppingRound is intended to increase generalization. It would be tragic if it causes training to see only a small fraction of the data, greatly reducing generalization.
I think that when both of these parameters are present, the early stopping should apply separately to each batch. I.E. when training has stopped making progress on the current batch, it should continue to the next batch.
Code to reproduce issue
model = LightGBMRegressor(
featuresCol="featureVector",
labelCol="label"
predictionCol="prediction",
validationIndicatorCol="validation",
numIterations=800,
numBatches=10,
earlyStoppingRound=1
)
model.fit(training_data) # may stop in the first batch, without seeing 90% of the training data
Other info / logs
No response
What component(s) does this bug affect?
What language(s) does this bug affect?
What integration(s) does this bug affect?
SynapseML version
1.0.13
System information
Describe the problem
From glancing at the LightGBM code, I believe there is a conflict between the
numBatchesparameter andearlyStoppingRound. If you set both these params, I think that you may hit early stopping in the first batch and then never train on the remaining batches.This would be very suboptimal.
earlyStoppingRoundis intended to increase generalization. It would be tragic if it causes training to see only a small fraction of the data, greatly reducing generalization.I think that when both of these parameters are present, the early stopping should apply separately to each batch. I.E. when training has stopped making progress on the current batch, it should continue to the next batch.
Code to reproduce issue
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive: Cognitive projectarea/core: Core projectarea/deep-learning: DeepLearning projectarea/lightgbm: Lightgbm projectarea/opencv: Opencv projectarea/vw: VW projectarea/website: Websitearea/build: Project build systemarea/notebooks: Samples under notebooks folderarea/docker: Docker usagearea/models: models related issueWhat language(s) does this bug affect?
language/scala: Scala source codelanguage/python: Pyspark APIslanguage/r: R APIslanguage/csharp: .NET APIslanguage/new: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse: Azure Synapse integrationsintegrations/azureml: Azure ML integrationsintegrations/databricks: Databricks integrations