Creating an agegroup variable in SAS












1















I need help creating this age group variable. In my data age is measured to 9 decimal places. I can decide the categories I just picked the quartiles. But I keep getting these errors...



"ERROR 388-185: Expecting an arithmetic operator.
ERROR 200-322: The symbol is not recognized and will be ignored."



I have tried rounding and changing the le to <= but it still gives the same error... :(



data sta310.hw4;
set sta310.gbcshort;
age_cat=.;
if age le 41.950498302 then age_cat = 1;
if age > 41.950498302 and le 49.764538386 then age_cat=2;
if age > 49.764538386 and le 56.696966378 then age_cat=3;
if age > 56.696966378 then age_cat=4;
run;









share|improve this question



























    1















    I need help creating this age group variable. In my data age is measured to 9 decimal places. I can decide the categories I just picked the quartiles. But I keep getting these errors...



    "ERROR 388-185: Expecting an arithmetic operator.
    ERROR 200-322: The symbol is not recognized and will be ignored."



    I have tried rounding and changing the le to <= but it still gives the same error... :(



    data sta310.hw4;
    set sta310.gbcshort;
    age_cat=.;
    if age le 41.950498302 then age_cat = 1;
    if age > 41.950498302 and le 49.764538386 then age_cat=2;
    if age > 49.764538386 and le 56.696966378 then age_cat=3;
    if age > 56.696966378 then age_cat=4;
    run;









    share|improve this question

























      1












      1








      1








      I need help creating this age group variable. In my data age is measured to 9 decimal places. I can decide the categories I just picked the quartiles. But I keep getting these errors...



      "ERROR 388-185: Expecting an arithmetic operator.
      ERROR 200-322: The symbol is not recognized and will be ignored."



      I have tried rounding and changing the le to <= but it still gives the same error... :(



      data sta310.hw4;
      set sta310.gbcshort;
      age_cat=.;
      if age le 41.950498302 then age_cat = 1;
      if age > 41.950498302 and le 49.764538386 then age_cat=2;
      if age > 49.764538386 and le 56.696966378 then age_cat=3;
      if age > 56.696966378 then age_cat=4;
      run;









      share|improve this question














      I need help creating this age group variable. In my data age is measured to 9 decimal places. I can decide the categories I just picked the quartiles. But I keep getting these errors...



      "ERROR 388-185: Expecting an arithmetic operator.
      ERROR 200-322: The symbol is not recognized and will be ignored."



      I have tried rounding and changing the le to <= but it still gives the same error... :(



      data sta310.hw4;
      set sta310.gbcshort;
      age_cat=.;
      if age le 41.950498302 then age_cat = 1;
      if age > 41.950498302 and le 49.764538386 then age_cat=2;
      if age > 49.764538386 and le 56.696966378 then age_cat=3;
      if age > 56.696966378 then age_cat=4;
      run;






      sas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 20 '18 at 14:37









      Anne PetersAnne Peters

      61




      61
























          4 Answers
          4






          active

          oldest

          votes


















          0














          this things are better of using proc format. You are missing your variable name after your and arthimetic operator. also you do not need age_cat = . in the beginning. please add your age variable after and before your arthimetic operator as shown below



           data sta310.hw4;
          set sta310.gbcshort;
          age_cat=.;
          if age le 41.950498302 then age_cat = 1;
          if age > 41.950498302 and age le 49.764538386 then age_cat=2;
          if age > 49.764538386 and age le 56.696966378 then age_cat=3;
          if age > 56.696966378 then age_cat=4;
          run;





          share|improve this answer































            0














            The and le or and <= syntax is incorrect. Such a syntax might be something out of COBOL.



            Try this form of a SAS Expression





            • value < variable <= value


            Example



            data sta310.hw4;
            set sta310.gbcshort;
            age_cat=.;
            if age <= 41.950498302 then age_cat = 1;
            if 41.950498302 < age <= 49.764538386 then age_cat=2;
            if 49.764538386 < age <= 56.696966378 then age_cat=3;
            if 56.696966378 < age then age_cat=4;
            run;


            A similar and safer sieve of logic can be accomplished using a select statement.



              select;
            when (age <= 41.950498302) age_cat=1;
            when (age <= 49.764538386) age_cat=2;
            when (age <= 56.696966378) age_cat=3;
            otherwise age_cat=4;
            end;


            The SAS select is different than C switch statement in that an affirming when statement flows past the select (and does not require a break as is often seen in switch/case)






            share|improve this answer

































              0














              The problem was in your if statements with multiple conditions. Also, because the age_cat is not a numeric variable (i.e you do not want to sum up this variable), I would put it as a character var of length 1, specifying it in an format statement upfront (best practice in SAS data management).
              Finally, I would also suggest reformulating your if else construct as to make it more memory efficient:



              data sta310.hw4;
              set sta310.gbcshort;
              format age_cat $1.;
              if age <= 41.950498302 then age_cat = "1";
              else if 41.950498302 < age <= 49.764538386 then age_cat= "2";
              else if 49.764538386 < age <= 56.696966378 then age_cat="3";
              else age_cat="4";
              run;


              Hope this helps,






              share|improve this answer



















              • 1





                Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                – Tom
                Nov 20 '18 at 15:28













              • Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                – Daniel Vieira
                Nov 20 '18 at 15:30













              • Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                – Tom
                Nov 20 '18 at 15:39













              • I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                – Daniel Vieira
                Nov 20 '18 at 15:46






              • 1





                A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                – Tom
                Nov 20 '18 at 16:02





















              0














              If you're grouping with quartiles avoid the hard coding and use PROC RANK with GROUPS=4. The groups will be 0 to 3 but same idea.



                 proc rank data=sta310.gbcshort out=sta310.hw4 groups=4;
              var age;
              rank age_cat;
              run;


              In your current program, this line/logic is your issue:



              if age > 41.950498302 and le 49.764538386 then age_cat=2;


              It should be:



               if 41.950498302 < age <= 49.764538386 then age_cat=2;


              You should also switch those to IF/ELSE IF rather than IF statements. You should do this because once it finds the category it stops evaluating the conditions so it's not checking each IF condition which makes it slightly faster. This isn't something you'll notice in your homework but if you ever work on larger data sets this is really important to know.



              if age <= 41.950498302 then age_cat = 1;
              else if 41.950498302 < age <= 49.764538386 then age_cat=2;
              else if 49.764538386 < age <= 56.696966378 then age_cat=3;
              else if 56.696966378 < age then age_cat=4;





              share|improve this answer



















              • 1





                Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                – Tom
                Nov 20 '18 at 15:36











              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395384%2fcreating-an-agegroup-variable-in-sas%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0














              this things are better of using proc format. You are missing your variable name after your and arthimetic operator. also you do not need age_cat = . in the beginning. please add your age variable after and before your arthimetic operator as shown below



               data sta310.hw4;
              set sta310.gbcshort;
              age_cat=.;
              if age le 41.950498302 then age_cat = 1;
              if age > 41.950498302 and age le 49.764538386 then age_cat=2;
              if age > 49.764538386 and age le 56.696966378 then age_cat=3;
              if age > 56.696966378 then age_cat=4;
              run;





              share|improve this answer




























                0














                this things are better of using proc format. You are missing your variable name after your and arthimetic operator. also you do not need age_cat = . in the beginning. please add your age variable after and before your arthimetic operator as shown below



                 data sta310.hw4;
                set sta310.gbcshort;
                age_cat=.;
                if age le 41.950498302 then age_cat = 1;
                if age > 41.950498302 and age le 49.764538386 then age_cat=2;
                if age > 49.764538386 and age le 56.696966378 then age_cat=3;
                if age > 56.696966378 then age_cat=4;
                run;





                share|improve this answer


























                  0












                  0








                  0







                  this things are better of using proc format. You are missing your variable name after your and arthimetic operator. also you do not need age_cat = . in the beginning. please add your age variable after and before your arthimetic operator as shown below



                   data sta310.hw4;
                  set sta310.gbcshort;
                  age_cat=.;
                  if age le 41.950498302 then age_cat = 1;
                  if age > 41.950498302 and age le 49.764538386 then age_cat=2;
                  if age > 49.764538386 and age le 56.696966378 then age_cat=3;
                  if age > 56.696966378 then age_cat=4;
                  run;





                  share|improve this answer













                  this things are better of using proc format. You are missing your variable name after your and arthimetic operator. also you do not need age_cat = . in the beginning. please add your age variable after and before your arthimetic operator as shown below



                   data sta310.hw4;
                  set sta310.gbcshort;
                  age_cat=.;
                  if age le 41.950498302 then age_cat = 1;
                  if age > 41.950498302 and age le 49.764538386 then age_cat=2;
                  if age > 49.764538386 and age le 56.696966378 then age_cat=3;
                  if age > 56.696966378 then age_cat=4;
                  run;






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 20 '18 at 15:03









                  Kiran Kiran

                  2,8153919




                  2,8153919

























                      0














                      The and le or and <= syntax is incorrect. Such a syntax might be something out of COBOL.



                      Try this form of a SAS Expression





                      • value < variable <= value


                      Example



                      data sta310.hw4;
                      set sta310.gbcshort;
                      age_cat=.;
                      if age <= 41.950498302 then age_cat = 1;
                      if 41.950498302 < age <= 49.764538386 then age_cat=2;
                      if 49.764538386 < age <= 56.696966378 then age_cat=3;
                      if 56.696966378 < age then age_cat=4;
                      run;


                      A similar and safer sieve of logic can be accomplished using a select statement.



                        select;
                      when (age <= 41.950498302) age_cat=1;
                      when (age <= 49.764538386) age_cat=2;
                      when (age <= 56.696966378) age_cat=3;
                      otherwise age_cat=4;
                      end;


                      The SAS select is different than C switch statement in that an affirming when statement flows past the select (and does not require a break as is often seen in switch/case)






                      share|improve this answer






























                        0














                        The and le or and <= syntax is incorrect. Such a syntax might be something out of COBOL.



                        Try this form of a SAS Expression





                        • value < variable <= value


                        Example



                        data sta310.hw4;
                        set sta310.gbcshort;
                        age_cat=.;
                        if age <= 41.950498302 then age_cat = 1;
                        if 41.950498302 < age <= 49.764538386 then age_cat=2;
                        if 49.764538386 < age <= 56.696966378 then age_cat=3;
                        if 56.696966378 < age then age_cat=4;
                        run;


                        A similar and safer sieve of logic can be accomplished using a select statement.



                          select;
                        when (age <= 41.950498302) age_cat=1;
                        when (age <= 49.764538386) age_cat=2;
                        when (age <= 56.696966378) age_cat=3;
                        otherwise age_cat=4;
                        end;


                        The SAS select is different than C switch statement in that an affirming when statement flows past the select (and does not require a break as is often seen in switch/case)






                        share|improve this answer




























                          0












                          0








                          0







                          The and le or and <= syntax is incorrect. Such a syntax might be something out of COBOL.



                          Try this form of a SAS Expression





                          • value < variable <= value


                          Example



                          data sta310.hw4;
                          set sta310.gbcshort;
                          age_cat=.;
                          if age <= 41.950498302 then age_cat = 1;
                          if 41.950498302 < age <= 49.764538386 then age_cat=2;
                          if 49.764538386 < age <= 56.696966378 then age_cat=3;
                          if 56.696966378 < age then age_cat=4;
                          run;


                          A similar and safer sieve of logic can be accomplished using a select statement.



                            select;
                          when (age <= 41.950498302) age_cat=1;
                          when (age <= 49.764538386) age_cat=2;
                          when (age <= 56.696966378) age_cat=3;
                          otherwise age_cat=4;
                          end;


                          The SAS select is different than C switch statement in that an affirming when statement flows past the select (and does not require a break as is often seen in switch/case)






                          share|improve this answer















                          The and le or and <= syntax is incorrect. Such a syntax might be something out of COBOL.



                          Try this form of a SAS Expression





                          • value < variable <= value


                          Example



                          data sta310.hw4;
                          set sta310.gbcshort;
                          age_cat=.;
                          if age <= 41.950498302 then age_cat = 1;
                          if 41.950498302 < age <= 49.764538386 then age_cat=2;
                          if 49.764538386 < age <= 56.696966378 then age_cat=3;
                          if 56.696966378 < age then age_cat=4;
                          run;


                          A similar and safer sieve of logic can be accomplished using a select statement.



                            select;
                          when (age <= 41.950498302) age_cat=1;
                          when (age <= 49.764538386) age_cat=2;
                          when (age <= 56.696966378) age_cat=3;
                          otherwise age_cat=4;
                          end;


                          The SAS select is different than C switch statement in that an affirming when statement flows past the select (and does not require a break as is often seen in switch/case)







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 20 '18 at 15:19

























                          answered Nov 20 '18 at 15:03









                          RichardRichard

                          9,08221227




                          9,08221227























                              0














                              The problem was in your if statements with multiple conditions. Also, because the age_cat is not a numeric variable (i.e you do not want to sum up this variable), I would put it as a character var of length 1, specifying it in an format statement upfront (best practice in SAS data management).
                              Finally, I would also suggest reformulating your if else construct as to make it more memory efficient:



                              data sta310.hw4;
                              set sta310.gbcshort;
                              format age_cat $1.;
                              if age <= 41.950498302 then age_cat = "1";
                              else if 41.950498302 < age <= 49.764538386 then age_cat= "2";
                              else if 49.764538386 < age <= 56.696966378 then age_cat="3";
                              else age_cat="4";
                              run;


                              Hope this helps,






                              share|improve this answer



















                              • 1





                                Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                                – Tom
                                Nov 20 '18 at 15:28













                              • Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                                – Daniel Vieira
                                Nov 20 '18 at 15:30













                              • Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                                – Tom
                                Nov 20 '18 at 15:39













                              • I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                                – Daniel Vieira
                                Nov 20 '18 at 15:46






                              • 1





                                A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                                – Tom
                                Nov 20 '18 at 16:02


















                              0














                              The problem was in your if statements with multiple conditions. Also, because the age_cat is not a numeric variable (i.e you do not want to sum up this variable), I would put it as a character var of length 1, specifying it in an format statement upfront (best practice in SAS data management).
                              Finally, I would also suggest reformulating your if else construct as to make it more memory efficient:



                              data sta310.hw4;
                              set sta310.gbcshort;
                              format age_cat $1.;
                              if age <= 41.950498302 then age_cat = "1";
                              else if 41.950498302 < age <= 49.764538386 then age_cat= "2";
                              else if 49.764538386 < age <= 56.696966378 then age_cat="3";
                              else age_cat="4";
                              run;


                              Hope this helps,






                              share|improve this answer



















                              • 1





                                Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                                – Tom
                                Nov 20 '18 at 15:28













                              • Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                                – Daniel Vieira
                                Nov 20 '18 at 15:30













                              • Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                                – Tom
                                Nov 20 '18 at 15:39













                              • I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                                – Daniel Vieira
                                Nov 20 '18 at 15:46






                              • 1





                                A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                                – Tom
                                Nov 20 '18 at 16:02
















                              0












                              0








                              0







                              The problem was in your if statements with multiple conditions. Also, because the age_cat is not a numeric variable (i.e you do not want to sum up this variable), I would put it as a character var of length 1, specifying it in an format statement upfront (best practice in SAS data management).
                              Finally, I would also suggest reformulating your if else construct as to make it more memory efficient:



                              data sta310.hw4;
                              set sta310.gbcshort;
                              format age_cat $1.;
                              if age <= 41.950498302 then age_cat = "1";
                              else if 41.950498302 < age <= 49.764538386 then age_cat= "2";
                              else if 49.764538386 < age <= 56.696966378 then age_cat="3";
                              else age_cat="4";
                              run;


                              Hope this helps,






                              share|improve this answer













                              The problem was in your if statements with multiple conditions. Also, because the age_cat is not a numeric variable (i.e you do not want to sum up this variable), I would put it as a character var of length 1, specifying it in an format statement upfront (best practice in SAS data management).
                              Finally, I would also suggest reformulating your if else construct as to make it more memory efficient:



                              data sta310.hw4;
                              set sta310.gbcshort;
                              format age_cat $1.;
                              if age <= 41.950498302 then age_cat = "1";
                              else if 41.950498302 < age <= 49.764538386 then age_cat= "2";
                              else if 49.764538386 < age <= 56.696966378 then age_cat="3";
                              else age_cat="4";
                              run;


                              Hope this helps,







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 20 '18 at 15:24









                              Daniel VieiraDaniel Vieira

                              1236




                              1236








                              • 1





                                Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                                – Tom
                                Nov 20 '18 at 15:28













                              • Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                                – Daniel Vieira
                                Nov 20 '18 at 15:30













                              • Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                                – Tom
                                Nov 20 '18 at 15:39













                              • I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                                – Daniel Vieira
                                Nov 20 '18 at 15:46






                              • 1





                                A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                                – Tom
                                Nov 20 '18 at 16:02
















                              • 1





                                Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                                – Tom
                                Nov 20 '18 at 15:28













                              • Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                                – Daniel Vieira
                                Nov 20 '18 at 15:30













                              • Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                                – Tom
                                Nov 20 '18 at 15:39













                              • I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                                – Daniel Vieira
                                Nov 20 '18 at 15:46






                              • 1





                                A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                                – Tom
                                Nov 20 '18 at 16:02










                              1




                              1





                              Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                              – Tom
                              Nov 20 '18 at 15:28







                              Why you are attaching the $1. format the the new variable? SAS already knows how to print character variables. To define the variable's type and length before using it in other statements then use a LENGTH or ATTRIB statements like FORMAT or assignment statements.

                              – Tom
                              Nov 20 '18 at 15:28















                              Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                              – Daniel Vieira
                              Nov 20 '18 at 15:30







                              Yes the purpose is to lock in the length , FORMAT statement is more memory efficient than LENGTH statement from a PDV construction perspective

                              – Daniel Vieira
                              Nov 20 '18 at 15:30















                              Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                              – Tom
                              Nov 20 '18 at 15:39







                              Whose memory? Once you have an analysis use the wrong number of groups because somehow too short a format was attached to a character variable you will remember for a long time that attaching $xx formats is a dangerous thing.

                              – Tom
                              Nov 20 '18 at 15:39















                              I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                              – Daniel Vieira
                              Nov 20 '18 at 15:46





                              I mean the total memory usage when build the Program Data Vector or PDV for short. LENGTH statement could also have been used of course, Regarding the situation you describe that could happen however that is not relevant to the question at hand so I do not understand why your call out, all the question states is regarding a simple class variable which I am recommending to be made a character variable as it would only occupy 1 byte vs the standard 8 bytes a normal numeric var would. Both LENGTH and FORMAT statements are correct choices for this particular problem / question

                              – Daniel Vieira
                              Nov 20 '18 at 15:46




                              1




                              1





                              A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                              – Tom
                              Nov 20 '18 at 16:02







                              A pet peeve of mine because of the danger and the confusion it causes for novice SAS programmers that see that usage and think that the format statement is actually a way of defining the variable's length. Instead the length is being set as a side effect of the variable's first appearance being in the format statement.

                              – Tom
                              Nov 20 '18 at 16:02













                              0














                              If you're grouping with quartiles avoid the hard coding and use PROC RANK with GROUPS=4. The groups will be 0 to 3 but same idea.



                                 proc rank data=sta310.gbcshort out=sta310.hw4 groups=4;
                              var age;
                              rank age_cat;
                              run;


                              In your current program, this line/logic is your issue:



                              if age > 41.950498302 and le 49.764538386 then age_cat=2;


                              It should be:



                               if 41.950498302 < age <= 49.764538386 then age_cat=2;


                              You should also switch those to IF/ELSE IF rather than IF statements. You should do this because once it finds the category it stops evaluating the conditions so it's not checking each IF condition which makes it slightly faster. This isn't something you'll notice in your homework but if you ever work on larger data sets this is really important to know.



                              if age <= 41.950498302 then age_cat = 1;
                              else if 41.950498302 < age <= 49.764538386 then age_cat=2;
                              else if 49.764538386 < age <= 56.696966378 then age_cat=3;
                              else if 56.696966378 < age then age_cat=4;





                              share|improve this answer



















                              • 1





                                Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                                – Tom
                                Nov 20 '18 at 15:36
















                              0














                              If you're grouping with quartiles avoid the hard coding and use PROC RANK with GROUPS=4. The groups will be 0 to 3 but same idea.



                                 proc rank data=sta310.gbcshort out=sta310.hw4 groups=4;
                              var age;
                              rank age_cat;
                              run;


                              In your current program, this line/logic is your issue:



                              if age > 41.950498302 and le 49.764538386 then age_cat=2;


                              It should be:



                               if 41.950498302 < age <= 49.764538386 then age_cat=2;


                              You should also switch those to IF/ELSE IF rather than IF statements. You should do this because once it finds the category it stops evaluating the conditions so it's not checking each IF condition which makes it slightly faster. This isn't something you'll notice in your homework but if you ever work on larger data sets this is really important to know.



                              if age <= 41.950498302 then age_cat = 1;
                              else if 41.950498302 < age <= 49.764538386 then age_cat=2;
                              else if 49.764538386 < age <= 56.696966378 then age_cat=3;
                              else if 56.696966378 < age then age_cat=4;





                              share|improve this answer



















                              • 1





                                Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                                – Tom
                                Nov 20 '18 at 15:36














                              0












                              0








                              0







                              If you're grouping with quartiles avoid the hard coding and use PROC RANK with GROUPS=4. The groups will be 0 to 3 but same idea.



                                 proc rank data=sta310.gbcshort out=sta310.hw4 groups=4;
                              var age;
                              rank age_cat;
                              run;


                              In your current program, this line/logic is your issue:



                              if age > 41.950498302 and le 49.764538386 then age_cat=2;


                              It should be:



                               if 41.950498302 < age <= 49.764538386 then age_cat=2;


                              You should also switch those to IF/ELSE IF rather than IF statements. You should do this because once it finds the category it stops evaluating the conditions so it's not checking each IF condition which makes it slightly faster. This isn't something you'll notice in your homework but if you ever work on larger data sets this is really important to know.



                              if age <= 41.950498302 then age_cat = 1;
                              else if 41.950498302 < age <= 49.764538386 then age_cat=2;
                              else if 49.764538386 < age <= 56.696966378 then age_cat=3;
                              else if 56.696966378 < age then age_cat=4;





                              share|improve this answer













                              If you're grouping with quartiles avoid the hard coding and use PROC RANK with GROUPS=4. The groups will be 0 to 3 but same idea.



                                 proc rank data=sta310.gbcshort out=sta310.hw4 groups=4;
                              var age;
                              rank age_cat;
                              run;


                              In your current program, this line/logic is your issue:



                              if age > 41.950498302 and le 49.764538386 then age_cat=2;


                              It should be:



                               if 41.950498302 < age <= 49.764538386 then age_cat=2;


                              You should also switch those to IF/ELSE IF rather than IF statements. You should do this because once it finds the category it stops evaluating the conditions so it's not checking each IF condition which makes it slightly faster. This isn't something you'll notice in your homework but if you ever work on larger data sets this is really important to know.



                              if age <= 41.950498302 then age_cat = 1;
                              else if 41.950498302 < age <= 49.764538386 then age_cat=2;
                              else if 49.764538386 < age <= 56.696966378 then age_cat=3;
                              else if 56.696966378 < age then age_cat=4;






                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Nov 20 '18 at 15:30









                              ReezaReeza

                              13.2k21227




                              13.2k21227








                              • 1





                                Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                                – Tom
                                Nov 20 '18 at 15:36














                              • 1





                                Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                                – Tom
                                Nov 20 '18 at 15:36








                              1




                              1





                              Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                              – Tom
                              Nov 20 '18 at 15:36





                              Once you add the ELSE you can simplify the conditions. else if age <= 49.764538386

                              – Tom
                              Nov 20 '18 at 15:36


















                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53395384%2fcreating-an-agegroup-variable-in-sas%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              How to change which sound is reproduced for terminal bell?

                              Can I use Tabulator js library in my java Spring + Thymeleaf project?

                              Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents