How to strip useless characters from utf-8 LIST











up vote
1
down vote

favorite












I have this following snippet.



def profile_details():  #function to fetch people
payload = 'grab'
global result_people
result_people =
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass

profile_details()
for i in result_people:
print(i)


and the output looks something like this



[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']


I want to strip characters such as xf0x9fx8cx9d from the list.



Output seems like a mess :



b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '


b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '



What can be the easiest and convenient way to achieve this.



Thanks in advance










share|improve this question


























    up vote
    1
    down vote

    favorite












    I have this following snippet.



    def profile_details():  #function to fetch people
    payload = 'grab'
    global result_people
    result_people =
    for i in range(0,5):
    git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
    rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
    page = bs4.BeautifulSoup(rr.text,"lxml")
    page_parse = page.select('.user-list-info p')
    for i in range(len(page_parse)):
    test = page_parse[i].text
    if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
    a = result_people.append(page_parse[i].text.encode("utf-8"))
    else:
    pass

    profile_details()
    for i in result_people:
    print(i)


    and the output looks something like this



    [b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']


    I want to strip characters such as xf0x9fx8cx9d from the list.



    Output seems like a mess :



    b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '


    b'n Coding at Amazon, previously @Grabn'
    b'n Software Engineer @grab rnPreviously @shopback n '
    b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
    b'n Coding at Amazon, previously @Grabn'
    b'n Software Engineer @grab rnPreviously @shopback n '



    What can be the easiest and convenient way to achieve this.



    Thanks in advance










    share|improve this question
























      up vote
      1
      down vote

      favorite









      up vote
      1
      down vote

      favorite











      I have this following snippet.



      def profile_details():  #function to fetch people
      payload = 'grab'
      global result_people
      result_people =
      for i in range(0,5):
      git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
      rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
      page = bs4.BeautifulSoup(rr.text,"lxml")
      page_parse = page.select('.user-list-info p')
      for i in range(len(page_parse)):
      test = page_parse[i].text
      if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
      a = result_people.append(page_parse[i].text.encode("utf-8"))
      else:
      pass

      profile_details()
      for i in result_people:
      print(i)


      and the output looks something like this



      [b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']


      I want to strip characters such as xf0x9fx8cx9d from the list.



      Output seems like a mess :



      b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '


      b'n Coding at Amazon, previously @Grabn'
      b'n Software Engineer @grab rnPreviously @shopback n '
      b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
      b'n Coding at Amazon, previously @Grabn'
      b'n Software Engineer @grab rnPreviously @shopback n '



      What can be the easiest and convenient way to achieve this.



      Thanks in advance










      share|improve this question













      I have this following snippet.



      def profile_details():  #function to fetch people
      payload = 'grab'
      global result_people
      result_people =
      for i in range(0,5):
      git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
      rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
      page = bs4.BeautifulSoup(rr.text,"lxml")
      page_parse = page.select('.user-list-info p')
      for i in range(len(page_parse)):
      test = page_parse[i].text
      if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
      a = result_people.append(page_parse[i].text.encode("utf-8"))
      else:
      pass

      profile_details()
      for i in result_people:
      print(i)


      and the output looks something like this



      [b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']


      I want to strip characters such as xf0x9fx8cx9d from the list.



      Output seems like a mess :



      b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '


      b'n Coding at Amazon, previously @Grabn'
      b'n Software Engineer @grab rnPreviously @shopback n '
      b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
      b'n Coding at Amazon, previously @Grabn'
      b'n Software Engineer @grab rnPreviously @shopback n '



      What can be the easiest and convenient way to achieve this.



      Thanks in advance







      python-3.x






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 13 at 1:54









      attacker nine

      122




      122
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Welcome to StackOverflow!



          You can do it by removing all non-ASCII characters from each string



          for i in result_people:
          print(i.decode('utf8').encode('ascii', errors='ignore'))





          share|improve this answer






























            up vote
            0
            down vote













            Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.



            result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272663%2fhow-to-strip-useless-characters-from-utf-8-list%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              0
              down vote



              accepted










              Welcome to StackOverflow!



              You can do it by removing all non-ASCII characters from each string



              for i in result_people:
              print(i.decode('utf8').encode('ascii', errors='ignore'))





              share|improve this answer



























                up vote
                0
                down vote



                accepted










                Welcome to StackOverflow!



                You can do it by removing all non-ASCII characters from each string



                for i in result_people:
                print(i.decode('utf8').encode('ascii', errors='ignore'))





                share|improve this answer

























                  up vote
                  0
                  down vote



                  accepted







                  up vote
                  0
                  down vote



                  accepted






                  Welcome to StackOverflow!



                  You can do it by removing all non-ASCII characters from each string



                  for i in result_people:
                  print(i.decode('utf8').encode('ascii', errors='ignore'))





                  share|improve this answer














                  Welcome to StackOverflow!



                  You can do it by removing all non-ASCII characters from each string



                  for i in result_people:
                  print(i.decode('utf8').encode('ascii', errors='ignore'))






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 13 at 2:12

























                  answered Nov 13 at 2:05









                  Andreas

                  1,293516




                  1,293516
























                      up vote
                      0
                      down vote













                      Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.



                      result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))





                      share|improve this answer

























                        up vote
                        0
                        down vote













                        Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.



                        result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))





                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.



                          result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))





                          share|improve this answer












                          Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.



                          result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 13 at 2:06









                          attacker nine

                          122




                          122






























                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272663%2fhow-to-strip-useless-characters-from-utf-8-list%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

                              ComboBox Display Member on multiple fields

                              Is it possible to collect Nectar points via Trainline?