How to strip useless characters from utf-8 LIST

up vote
1
down vote

favorite

I have this following snippet.

def profile_details():  #function to fetch people

    payload = 'grab'

    global result_people 

    result_people = 

    for i in range(0,5):

        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"

        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)

        page =  bs4.BeautifulSoup(rr.text,"lxml")

        page_parse = page.select('.user-list-info p')

        for i in range(len(page_parse)): 

                test = page_parse[i].text

                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):

                        a = result_people.append(page_parse[i].text.encode("utf-8"))

                else:

                        pass



profile_details()

for i in result_people:

        print(i)

and the output looks something like this

[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']

I want to strip characters such as xf0x9fx8cx9d from the list.

Output seems like a mess :

b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '

b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '

What can be the easiest and convenient way to achieve this.

Thanks in advance

asked Nov 13 at 1:54

attacker nine

122

add a comment |

up vote
1
down vote

favorite

I have this following snippet.

def profile_details():  #function to fetch people

    payload = 'grab'

    global result_people 

    result_people = 

    for i in range(0,5):

        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"

        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)

        page =  bs4.BeautifulSoup(rr.text,"lxml")

        page_parse = page.select('.user-list-info p')

        for i in range(len(page_parse)): 

                test = page_parse[i].text

                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):

                        a = result_people.append(page_parse[i].text.encode("utf-8"))

                else:

                        pass



profile_details()

for i in result_people:

        print(i)

and the output looks something like this

[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']

I want to strip characters such as xf0x9fx8cx9d from the list.

Output seems like a mess :

b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '

What can be the easiest and convenient way to achieve this.

Thanks in advance

asked Nov 13 at 1:54

attacker nine

122

add a comment |

up vote
1
down vote

favorite

I have this following snippet.

def profile_details():  #function to fetch people

    payload = 'grab'

    global result_people 

    result_people = 

    for i in range(0,5):

        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"

        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)

        page =  bs4.BeautifulSoup(rr.text,"lxml")

        page_parse = page.select('.user-list-info p')

        for i in range(len(page_parse)): 

                test = page_parse[i].text

                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):

                        a = result_people.append(page_parse[i].text.encode("utf-8"))

                else:

                        pass



profile_details()

for i in result_people:

        print(i)

and the output looks something like this

[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']

I want to strip characters such as xf0x9fx8cx9d from the list.

Output seems like a mess :

b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '

What can be the easiest and convenient way to achieve this.

Thanks in advance

asked Nov 13 at 1:54

attacker nine

122

I have this following snippet.

def profile_details():  #function to fetch people

    payload = 'grab'

    global result_people 

    result_people = 

    for i in range(0,5):

        git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"

        rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)

        page =  bs4.BeautifulSoup(rr.text,"lxml")

        page_parse = page.select('.user-list-info p')

        for i in range(len(page_parse)): 

                test = page_parse[i].text

                if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test)  or ('@grab' in test):

                        a = result_people.append(page_parse[i].text.encode("utf-8"))

                else:

                        pass



profile_details()

for i in result_people:

        print(i)

and the output looks something like this

[b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        ', b'n          Coding at Amazon, previously @Grabn', b'n          Software Engineer @grab rnPreviously @shopback n        ', b'n          UX Engineer @ Grabn', b'n          Designer at @Grab. Design Systems. Emerging tech (AR).n        ', b'n          Mobile Developer (iOS) @Grab. Previously Flipkart.n        ', b'n          Data science and engineering at Grabn', b'n          Software Engineer @ Grab.n        ', b"n          Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen        ", b'n          Frontend Software Engineer at Grabn', b'n          Developer @Grab(GrabTaxi)n        ', b'n          Full Stack - Software Engineer @ Grab | AI Enthusiastn        ', b'n          Software Engineer at Grabn', b'n          Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn        ', b'n          Ex-Engineering Lead @grab, Ex-DoE @90secondsn        ', b'n          Software Engineer/ Gopher. Worked @grab, @microsoftn        ']

I want to strip characters such as xf0x9fx8cx9d from the list.

Output seems like a mess :

b'n          Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn        '

What can be the easiest and convenient way to achieve this.

Thanks in advance

python-3.x

asked Nov 13 at 1:54

attacker nine

122

asked Nov 13 at 1:54

attacker nine

122

asked Nov 13 at 1:54

attacker nine

122

asked Nov 13 at 1:54

attacker nine

122

asked Nov 13 at 1:54

attacker nine

122

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Welcome to StackOverflow!

You can do it by removing all non-ASCII characters from each string

for i in result_people:

    print(i.decode('utf8').encode('ascii', errors='ignore'))

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

add a comment |

up vote
0
down vote

Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

answered Nov 13 at 2:06

attacker nine

122

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272663%2fhow-to-strip-useless-characters-from-utf-8-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Welcome to StackOverflow!

You can do it by removing all non-ASCII characters from each string

for i in result_people:

    print(i.decode('utf8').encode('ascii', errors='ignore'))

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

add a comment |

up vote
0
down vote

accepted

Welcome to StackOverflow!

You can do it by removing all non-ASCII characters from each string

for i in result_people:

    print(i.decode('utf8').encode('ascii', errors='ignore'))

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

add a comment |

up vote
0
down vote

accepted

Welcome to StackOverflow!

You can do it by removing all non-ASCII characters from each string

for i in result_people:

    print(i.decode('utf8').encode('ascii', errors='ignore'))

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

Welcome to StackOverflow!

You can do it by removing all non-ASCII characters from each string

for i in result_people:

    print(i.decode('utf8').encode('ascii', errors='ignore'))

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

edited Nov 13 at 2:12

answered Nov 13 at 2:05

Andreas

1,293516

answered Nov 13 at 2:05

Andreas

1,293516

answered Nov 13 at 2:05

Andreas

1,293516

add a comment |

up vote
0
down vote

Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

answered Nov 13 at 2:06

attacker nine

122

add a comment |

up vote
0
down vote

Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

answered Nov 13 at 2:06

attacker nine

122

add a comment |

up vote
0
down vote

Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

answered Nov 13 at 2:06

attacker nine

122

Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.

result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))

answered Nov 13 at 2:06

attacker nine

122

answered Nov 13 at 2:06

attacker nine

122

answered Nov 13 at 2:06

attacker nine

122

answered Nov 13 at 2:06

attacker nine

122

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky