Feature importance using lightgbm












0















I am trying to run my lightgbm for feature selection as below;



initialization



# Initialize an empty array to hold feature importances
feature_importances = np.zeros(features_sample.shape[1])

# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary',
boosting_type = 'goss',
n_estimators = 10000, class_weight ='balanced')


then i fit the model as below



# Fit the model twice to avoid overfitting
for i in range(2):

# Split into training and validation set
train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)

# Train using early stopping
model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],
eval_metric = 'auc', verbose = 200)

# Record the feature importances
feature_importances += model.feature_importances_


but i get the below error



Training until validation scores don't improve for 100 rounds. 
Early stopping, best iteration is: [6] valid_0's auc: 0.88648
ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)









share|improve this question

























  • How do you initialize feature_importances ?

    – Florian Mutel
    Nov 21 '18 at 14:36











  • @FlorianMutel see th eupdated post

    – Ian Okeyo
    Nov 22 '18 at 6:30











  • What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

    – Florian Mutel
    Nov 22 '18 at 13:34


















0















I am trying to run my lightgbm for feature selection as below;



initialization



# Initialize an empty array to hold feature importances
feature_importances = np.zeros(features_sample.shape[1])

# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary',
boosting_type = 'goss',
n_estimators = 10000, class_weight ='balanced')


then i fit the model as below



# Fit the model twice to avoid overfitting
for i in range(2):

# Split into training and validation set
train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)

# Train using early stopping
model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],
eval_metric = 'auc', verbose = 200)

# Record the feature importances
feature_importances += model.feature_importances_


but i get the below error



Training until validation scores don't improve for 100 rounds. 
Early stopping, best iteration is: [6] valid_0's auc: 0.88648
ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)









share|improve this question

























  • How do you initialize feature_importances ?

    – Florian Mutel
    Nov 21 '18 at 14:36











  • @FlorianMutel see th eupdated post

    – Ian Okeyo
    Nov 22 '18 at 6:30











  • What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

    – Florian Mutel
    Nov 22 '18 at 13:34
















0












0








0








I am trying to run my lightgbm for feature selection as below;



initialization



# Initialize an empty array to hold feature importances
feature_importances = np.zeros(features_sample.shape[1])

# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary',
boosting_type = 'goss',
n_estimators = 10000, class_weight ='balanced')


then i fit the model as below



# Fit the model twice to avoid overfitting
for i in range(2):

# Split into training and validation set
train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)

# Train using early stopping
model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],
eval_metric = 'auc', verbose = 200)

# Record the feature importances
feature_importances += model.feature_importances_


but i get the below error



Training until validation scores don't improve for 100 rounds. 
Early stopping, best iteration is: [6] valid_0's auc: 0.88648
ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)









share|improve this question
















I am trying to run my lightgbm for feature selection as below;



initialization



# Initialize an empty array to hold feature importances
feature_importances = np.zeros(features_sample.shape[1])

# Create the model with several hyperparameters
model = lgb.LGBMClassifier(objective='binary',
boosting_type = 'goss',
n_estimators = 10000, class_weight ='balanced')


then i fit the model as below



# Fit the model twice to avoid overfitting
for i in range(2):

# Split into training and validation set
train_features, valid_features, train_y, valid_y = train_test_split(train_X, train_Y, test_size = 0.25, random_state = i)

# Train using early stopping
model.fit(train_features, train_y, early_stopping_rounds=100, eval_set = [(valid_features, valid_y)],
eval_metric = 'auc', verbose = 200)

# Record the feature importances
feature_importances += model.feature_importances_


but i get the below error



Training until validation scores don't improve for 100 rounds. 
Early stopping, best iteration is: [6] valid_0's auc: 0.88648
ValueError: operands could not be broadcast together with shapes (87,) (83,) (87,)






python python-3.x lightgbm






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 6:30







Ian Okeyo

















asked Nov 21 '18 at 13:58









Ian OkeyoIan Okeyo

142




142













  • How do you initialize feature_importances ?

    – Florian Mutel
    Nov 21 '18 at 14:36











  • @FlorianMutel see th eupdated post

    – Ian Okeyo
    Nov 22 '18 at 6:30











  • What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

    – Florian Mutel
    Nov 22 '18 at 13:34





















  • How do you initialize feature_importances ?

    – Florian Mutel
    Nov 21 '18 at 14:36











  • @FlorianMutel see th eupdated post

    – Ian Okeyo
    Nov 22 '18 at 6:30











  • What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

    – Florian Mutel
    Nov 22 '18 at 13:34



















How do you initialize feature_importances ?

– Florian Mutel
Nov 21 '18 at 14:36





How do you initialize feature_importances ?

– Florian Mutel
Nov 21 '18 at 14:36













@FlorianMutel see th eupdated post

– Ian Okeyo
Nov 22 '18 at 6:30





@FlorianMutel see th eupdated post

– Ian Okeyo
Nov 22 '18 at 6:30













What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

– Florian Mutel
Nov 22 '18 at 13:34







What is features_sample ? How many features do you have ? I cannot reproduce your bug with Iris data for example. It seems you are trying to add arrays with different shapes. Either you initialized with wrong dimensions, or some of your features become empty (all nan), or constant when you are splitting your data (train / valid), and lightgbm ignores them. Try looking at your splits!

– Florian Mutel
Nov 22 '18 at 13:34














1 Answer
1






active

oldest

votes


















0














An example for getting feature importance in lightgbm when using train model.



import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

def plotImp(model, X , num = 20):
feature_imp = pd.DataFrame(sorted(zip(model.feature_importance(),X.columns)),
columns=['Value','Feature'])
plt.figure(figsize=(40, 20))
sns.set(font_scale = 5)
sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
ascending=False)[0:num])
plt.title('LightGBM Features (avg over folds)')
plt.tight_layout()
plt.show()
plt.savefig('lgbm_importances-01.png')





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413701%2ffeature-importance-using-lightgbm%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    An example for getting feature importance in lightgbm when using train model.



    import matplotlib.pyplot as plt
    import seaborn as sns
    import warnings
    warnings.simplefilter(action='ignore', category=FutureWarning)

    def plotImp(model, X , num = 20):
    feature_imp = pd.DataFrame(sorted(zip(model.feature_importance(),X.columns)),
    columns=['Value','Feature'])
    plt.figure(figsize=(40, 20))
    sns.set(font_scale = 5)
    sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
    ascending=False)[0:num])
    plt.title('LightGBM Features (avg over folds)')
    plt.tight_layout()
    plt.show()
    plt.savefig('lgbm_importances-01.png')





    share|improve this answer




























      0














      An example for getting feature importance in lightgbm when using train model.



      import matplotlib.pyplot as plt
      import seaborn as sns
      import warnings
      warnings.simplefilter(action='ignore', category=FutureWarning)

      def plotImp(model, X , num = 20):
      feature_imp = pd.DataFrame(sorted(zip(model.feature_importance(),X.columns)),
      columns=['Value','Feature'])
      plt.figure(figsize=(40, 20))
      sns.set(font_scale = 5)
      sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
      ascending=False)[0:num])
      plt.title('LightGBM Features (avg over folds)')
      plt.tight_layout()
      plt.show()
      plt.savefig('lgbm_importances-01.png')





      share|improve this answer


























        0












        0








        0







        An example for getting feature importance in lightgbm when using train model.



        import matplotlib.pyplot as plt
        import seaborn as sns
        import warnings
        warnings.simplefilter(action='ignore', category=FutureWarning)

        def plotImp(model, X , num = 20):
        feature_imp = pd.DataFrame(sorted(zip(model.feature_importance(),X.columns)),
        columns=['Value','Feature'])
        plt.figure(figsize=(40, 20))
        sns.set(font_scale = 5)
        sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
        ascending=False)[0:num])
        plt.title('LightGBM Features (avg over folds)')
        plt.tight_layout()
        plt.show()
        plt.savefig('lgbm_importances-01.png')





        share|improve this answer













        An example for getting feature importance in lightgbm when using train model.



        import matplotlib.pyplot as plt
        import seaborn as sns
        import warnings
        warnings.simplefilter(action='ignore', category=FutureWarning)

        def plotImp(model, X , num = 20):
        feature_imp = pd.DataFrame(sorted(zip(model.feature_importance(),X.columns)),
        columns=['Value','Feature'])
        plt.figure(figsize=(40, 20))
        sns.set(font_scale = 5)
        sns.barplot(x="Value", y="Feature", data=feature_imp.sort_values(by="Value",
        ascending=False)[0:num])
        plt.title('LightGBM Features (avg over folds)')
        plt.tight_layout()
        plt.show()
        plt.savefig('lgbm_importances-01.png')






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 2 '18 at 8:27









        rosefunrosefun

        411211




        411211
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413701%2ffeature-importance-using-lightgbm%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

            ComboBox Display Member on multiple fields

            Is it possible to collect Nectar points via Trainline?