Introduction to LightGBM Algorithm

Posted By :Ashish Bhatnagar |15th August 2021

As we know as Machine learning engineers or Data scientists we always crave for the algorithm that is fast in training as well as produces some accurate results.

 

One of the famous algorithms introduced by Microsoft recently in Light GBM.

Light GBM is a fast, distributed, high-performance gradient boosting framework based on a decision tree algorithm, used for ranking, classification and many other machine learning tasks.

Since it is also based on decision tree algorithms like Random forest XGBOOST, it also splits the tree leaf wise with the best fit whereas other boosting algorithms like Random Forest split the tree depth-wise or level-wise rather than leaf-wise. So when growing on the same leaf in LightGBM, the leaf-wise algorithm can easily reduce more loss than the level-wise algorithm and hence results in much better accuracy which can rarely be met by any of the current boosting techniques.

Source:https://www.akira.ai/glossary/lightgbm/

As we know that the size of data is increasing day by day and it is becoming very difficult for data science algorithms to give faster and better results. Light GBM is popular because of its super-fast speed. Light GBM can easily handle the large size of data and it takes much less memory to run. Also, Light GBM is very popular because it focuses on the accuracy of results. Light GBM  also supports GPU(Graphical processing unit) learning and thus machine learning engineers are widely using Light GBM for Artificial Intelligence application development.

Also, Light GBM generates much more complex trees by following a leaf-wise split approach rather than a level-wise approach which is the main reason in achieving much higher accuracy than other algorithms. However, it can sometimes suffer from overfitting which can be avoided by tuning hyperparameters like the max_depth parameter.

It takes very less training time as compared to xgboost.

 

One of the limitations of Light GBM is that it is not advisable to use it on small datasets. Light GBM is highly sensitive to overfitting and can easily overfit small data. There is also no threshold on the number of rows. So it is highly recommendable to use it only for data with 8,000+ rows.


 

Some of the important parameters of light gbm is:

max_depth: This parameter is responsible to handle model overfitting.It describes the maximum depth of a tree. This parameter is used to control model overfitting.  If overfitting it is advisable to reduce max_depth.

min_data_in_leaf: It is the least number of the records a particular leaf may contain. The default value is 20, the optimum value. min_data_in_leaf is also used to deal with overfitting.

feature_fraction: Used when the boosting algorithm is random forest. 0.2 feature fraction means LightGBM will select 20% of parameters randomly for each iteration during building trees.

Early_stopping_round: By setting this parameter the model will stop training if one metric of one validation data doesn’t improve in the last early_stopping_round rounds.

lambda: This parameter handles regularization. It ranges between 0 to 1.

Implementation:

Installation of LightGBM and loading Iris dataset:

# Installing light gbm algorithm !pip install lightgbm # Loading iris dataset iris = datasets.load_iris() X = iris.data[:, :2]  # we only take the first two features. y = iris.target x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5 y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5

Preprocessing on the dataset:

# Preprocessing on the dataset X = iris.data  # we only take the first two features. y = iris.target le = preprocessing.LabelEncoder() y_label=le.fit_transform(y) classes=le.classes_ # Splitting data into train and test X_train, X_test, y_train, y_test = train_test_split(X, y_label, test_size=0.30, random_state=42)

Hyperparameters tuning:

# Setting parameters for LightGBM model params = {          "objective" : "multiclass",          "num_class" : 4,          "num_leaves" : 60,          "max_depth": -1,          "learning_rate" : 0.01,          "bagging_fraction" : 0.9,  # subsample          "feature_fraction" : 0.9,  # colsample_bytree          "bagging_freq" : 5,        # subsample_freq          "bagging_seed" : 2018,          "verbosity" : -1 }

Training data with LightGBM model:

# Traning LightGBM model lgtrain, lgval = lgb.Dataset(X_train, y_train), lgb.Dataset(X_test, y_test) lgbmodel = lgb.train(params, lgtrain, 2000, valid_sets=[lgtrain, lgval], early_stopping_rounds=100, verbose_eval=20)

The result on the test set:

 

Hence LightGBM is a very famous, simple and effective algorithm that can get your task accomplished quickly and accurately.


About Author

Ashish Bhatnagar

He is a enthusiastic and have a good grip on latest technologies like ML, DL and Computer vision. He is focused and always willing to learn.

Request For Proposal

[contact-form-7 404 "Not Found"]

Ready to innovate ? Let's get in touch

Chat With Us