Spark Equivalent of IF Then ELSE











up vote
9
down vote

favorite
6












I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



from pyspark.sql import functions as F

iris_spark_df = iris_spark.withColumn(
"Class",
F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


Throws the following error.



---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)


---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-157-21818c7dc060> in <module>()
----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

TypeError: when() takes exactly 2 arguments (3 given)


Any idea why?










share|improve this question




























    up vote
    9
    down vote

    favorite
    6












    I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



    I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



    from pyspark.sql import functions as F

    iris_spark_df = iris_spark.withColumn(
    "Class",
    F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


    Throws the following error.



    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-157-21818c7dc060> in <module>()
    ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

    TypeError: when() takes exactly 2 arguments (3 given)


    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    <ipython-input-157-21818c7dc060> in <module>()
    ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

    TypeError: when() takes exactly 2 arguments (3 given)


    Any idea why?










    share|improve this question


























      up vote
      9
      down vote

      favorite
      6









      up vote
      9
      down vote

      favorite
      6






      6





      I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



      I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



      from pyspark.sql import functions as F

      iris_spark_df = iris_spark.withColumn(
      "Class",
      F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


      Throws the following error.



      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      Any idea why?










      share|improve this question















      I have seen this question earlier here and I have took lessons from that. However I am not sure why I am getting an error when I feel it should work.



      I want to create a new column in existing Spark DataFrame by some rules. Here is what I wrote. iris_spark is the data frame with a categorical variable iris_spark with three distinct categories.



      from pyspark.sql import functions as F

      iris_spark_df = iris_spark.withColumn(
      "Class",
      F.when(iris_spark.iris_class == 'Iris-setosa', 0, F.when(iris_spark.iris_class == 'Iris-versicolor',1)).otherwise(2))


      Throws the following error.



      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      ---------------------------------------------------------------------------
      TypeError Traceback (most recent call last)
      <ipython-input-157-21818c7dc060> in <module>()
      ----> 1 iris_spark_df=iris_spark.withColumn("Class",F.when(iris_spark.iris_class=='Iris-setosa',0,F.when(iris_spark.iris_class=='Iris-versicolor',1)))

      TypeError: when() takes exactly 2 arguments (3 given)


      Any idea why?







      python apache-spark pyspark apache-spark-sql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 10 '17 at 1:43









      Community

      11




      11










      asked Aug 19 '16 at 21:59









      Baktaawar

      1,62582957




      1,62582957
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          25
          down vote













          Correct structure is either:



          (when(col("iris_class") == 'Iris-setosa', 0)
          .when(col("iris_class") == 'Iris-versicolor', 1)
          .otherwise(2))


          which is equivalent to



          CASE 
          WHEN (iris_class = 'Iris-setosa') THEN 0
          WHEN (iris_class = 'Iris-versicolor') THEN 1
          ELSE 2
          END


          or:



          (when(col("iris_class") == 'Iris-setosa', 0)
          .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
          .otherwise(2)))


          which is equivalent to:



          CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
          ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
          ELSE 2
          END
          END


          with general syntax:



          when(condition, value).when(...)


          or



          when(condition, value).otherwise(...)


          You probably mixed up things with Hive IF conditional:



          IF(condition, if-true, if-false)


          which can be used only in raw SQL with Hive support.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f39048229%2fspark-equivalent-of-if-then-else%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            25
            down vote













            Correct structure is either:



            (when(col("iris_class") == 'Iris-setosa', 0)
            .when(col("iris_class") == 'Iris-versicolor', 1)
            .otherwise(2))


            which is equivalent to



            CASE 
            WHEN (iris_class = 'Iris-setosa') THEN 0
            WHEN (iris_class = 'Iris-versicolor') THEN 1
            ELSE 2
            END


            or:



            (when(col("iris_class") == 'Iris-setosa', 0)
            .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
            .otherwise(2)))


            which is equivalent to:



            CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
            ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
            ELSE 2
            END
            END


            with general syntax:



            when(condition, value).when(...)


            or



            when(condition, value).otherwise(...)


            You probably mixed up things with Hive IF conditional:



            IF(condition, if-true, if-false)


            which can be used only in raw SQL with Hive support.






            share|improve this answer



























              up vote
              25
              down vote













              Correct structure is either:



              (when(col("iris_class") == 'Iris-setosa', 0)
              .when(col("iris_class") == 'Iris-versicolor', 1)
              .otherwise(2))


              which is equivalent to



              CASE 
              WHEN (iris_class = 'Iris-setosa') THEN 0
              WHEN (iris_class = 'Iris-versicolor') THEN 1
              ELSE 2
              END


              or:



              (when(col("iris_class") == 'Iris-setosa', 0)
              .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
              .otherwise(2)))


              which is equivalent to:



              CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
              ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
              ELSE 2
              END
              END


              with general syntax:



              when(condition, value).when(...)


              or



              when(condition, value).otherwise(...)


              You probably mixed up things with Hive IF conditional:



              IF(condition, if-true, if-false)


              which can be used only in raw SQL with Hive support.






              share|improve this answer

























                up vote
                25
                down vote










                up vote
                25
                down vote









                Correct structure is either:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2))


                which is equivalent to



                CASE 
                WHEN (iris_class = 'Iris-setosa') THEN 0
                WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END


                or:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2)))


                which is equivalent to:



                CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
                ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END
                END


                with general syntax:



                when(condition, value).when(...)


                or



                when(condition, value).otherwise(...)


                You probably mixed up things with Hive IF conditional:



                IF(condition, if-true, if-false)


                which can be used only in raw SQL with Hive support.






                share|improve this answer














                Correct structure is either:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2))


                which is equivalent to



                CASE 
                WHEN (iris_class = 'Iris-setosa') THEN 0
                WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END


                or:



                (when(col("iris_class") == 'Iris-setosa', 0)
                .otherwise(when(col("iris_class") == 'Iris-versicolor', 1)
                .otherwise(2)))


                which is equivalent to:



                CASE WHEN (iris_class = 'Iris-setosa') THEN 0 
                ELSE CASE WHEN (iris_class = 'Iris-versicolor') THEN 1
                ELSE 2
                END
                END


                with general syntax:



                when(condition, value).when(...)


                or



                when(condition, value).otherwise(...)


                You probably mixed up things with Hive IF conditional:



                IF(condition, if-true, if-false)


                which can be used only in raw SQL with Hive support.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Mar 1 '17 at 23:54

























                answered Aug 19 '16 at 22:26









                zero323

                161k39461562




                161k39461562






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f39048229%2fspark-equivalent-of-if-then-else%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                    ComboBox Display Member on multiple fields

                    Is it possible to collect Nectar points via Trainline?