How to strip useless characters from utf-8 LIST
up vote
1
down vote
favorite
I have this following snippet.
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people =
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
and the output looks something like this
[b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n UX Engineer @ Grabn', b'n Designer at @Grab. Design Systems. Emerging tech (AR).n ', b'n Mobile Developer (iOS) @Grab. Previously Flipkart.n ', b'n Data science and engineering at Grabn', b'n Software Engineer @ Grab.n ', b"n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen ", b'n Frontend Software Engineer at Grabn', b'n Developer @Grab(GrabTaxi)n ', b'n Full Stack - Software Engineer @ Grab | AI Enthusiastn ', b'n Software Engineer at Grabn', b'n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn ', b'n Ex-Engineering Lead @grab, Ex-DoE @90secondsn ', b'n Software Engineer/ Gopher. Worked @grab, @microsoftn ']
I want to strip characters such as xf0x9fx8cx9d from the list.
Output seems like a mess :
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
What can be the easiest and convenient way to achieve this.
Thanks in advance
python-3.x
add a comment |
up vote
1
down vote
favorite
I have this following snippet.
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people =
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
and the output looks something like this
[b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n UX Engineer @ Grabn', b'n Designer at @Grab. Design Systems. Emerging tech (AR).n ', b'n Mobile Developer (iOS) @Grab. Previously Flipkart.n ', b'n Data science and engineering at Grabn', b'n Software Engineer @ Grab.n ', b"n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen ", b'n Frontend Software Engineer at Grabn', b'n Developer @Grab(GrabTaxi)n ', b'n Full Stack - Software Engineer @ Grab | AI Enthusiastn ', b'n Software Engineer at Grabn', b'n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn ', b'n Ex-Engineering Lead @grab, Ex-DoE @90secondsn ', b'n Software Engineer/ Gopher. Worked @grab, @microsoftn ']
I want to strip characters such as xf0x9fx8cx9d from the list.
Output seems like a mess :
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
What can be the easiest and convenient way to achieve this.
Thanks in advance
python-3.x
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have this following snippet.
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people =
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
and the output looks something like this
[b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n UX Engineer @ Grabn', b'n Designer at @Grab. Design Systems. Emerging tech (AR).n ', b'n Mobile Developer (iOS) @Grab. Previously Flipkart.n ', b'n Data science and engineering at Grabn', b'n Software Engineer @ Grab.n ', b"n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen ", b'n Frontend Software Engineer at Grabn', b'n Developer @Grab(GrabTaxi)n ', b'n Full Stack - Software Engineer @ Grab | AI Enthusiastn ', b'n Software Engineer at Grabn', b'n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn ', b'n Ex-Engineering Lead @grab, Ex-DoE @90secondsn ', b'n Software Engineer/ Gopher. Worked @grab, @microsoftn ']
I want to strip characters such as xf0x9fx8cx9d from the list.
Output seems like a mess :
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
What can be the easiest and convenient way to achieve this.
Thanks in advance
python-3.x
I have this following snippet.
def profile_details(): #function to fetch people
payload = 'grab'
global result_people
result_people =
for i in range(0,5):
git_url = "https://github.com/search?p="+str(i)+"&q="+str(payload)+"&type=Users"
rr = requests.get(git_url, headers=burp0_headers, cookies=burp0_cookies)
page = bs4.BeautifulSoup(rr.text,"lxml")
page_parse = page.select('.user-list-info p')
for i in range(len(page_parse)):
test = page_parse[i].text
if ('@ Grab' in test) or ('at Grab' in test) or ('@Grab' in test) or ('@grab' in test):
a = result_people.append(page_parse[i].text.encode("utf-8"))
else:
pass
profile_details()
for i in result_people:
print(i)
and the output looks something like this
[b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn ', b'n Coding at Amazon, previously @Grabn', b'n Software Engineer @grab rnPreviously @shopback n ', b'n UX Engineer @ Grabn', b'n Designer at @Grab. Design Systems. Emerging tech (AR).n ', b'n Mobile Developer (iOS) @Grab. Previously Flipkart.n ', b'n Data science and engineering at Grabn', b'n Software Engineer @ Grab.n ', b"n Finding top #talent for @Grab's #mobile #app development teams, software engineering, #iOS & #Android in #Singaporen ", b'n Frontend Software Engineer at Grabn', b'n Developer @Grab(GrabTaxi)n ', b'n Full Stack - Software Engineer @ Grab | AI Enthusiastn ', b'n Software Engineer at Grabn', b'n Software Engineer @Grab | Previous @udacity @disney | Open Source nut, right now juggling with iOS and Swiftn ', b'n Ex-Engineering Lead @grab, Ex-DoE @90secondsn ', b'n Software Engineer/ Gopher. Worked @grab, @microsoftn ']
I want to strip characters such as xf0x9fx8cx9d from the list.
Output seems like a mess :
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
b'n Front End @facebook xf0x9fx8cx9d xc2xb7 Maintaining Docusaurus xc2xb7 Ex-@grab xf0x9fx87xb8xf0x9fx87xacrnn '
b'n Coding at Amazon, previously @Grabn'
b'n Software Engineer @grab rnPreviously @shopback n '
What can be the easiest and convenient way to achieve this.
Thanks in advance
python-3.x
python-3.x
asked Nov 13 at 1:54
attacker nine
122
122
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
0
down vote
accepted
Welcome to StackOverflow!
You can do it by removing all non-ASCII characters from each string
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
add a comment |
up vote
0
down vote
Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Welcome to StackOverflow!
You can do it by removing all non-ASCII characters from each string
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
add a comment |
up vote
0
down vote
accepted
Welcome to StackOverflow!
You can do it by removing all non-ASCII characters from each string
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Welcome to StackOverflow!
You can do it by removing all non-ASCII characters from each string
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
Welcome to StackOverflow!
You can do it by removing all non-ASCII characters from each string
for i in result_people:
print(i.decode('utf8').encode('ascii', errors='ignore'))
edited Nov 13 at 2:12
answered Nov 13 at 2:05
Andreas
1,293516
1,293516
add a comment |
add a comment |
up vote
0
down vote
Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))
add a comment |
up vote
0
down vote
Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))
add a comment |
up vote
0
down vote
up vote
0
down vote
Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))
Resolved it by ignoring the error while encoding in ascii by using ignore as an argument & then convert it back to utf-8.
result_people.append(page_parse[i].text.encode('ascii', 'ignore').decode("utf-8"))
answered Nov 13 at 2:06
attacker nine
122
122
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53272663%2fhow-to-strip-useless-characters-from-utf-8-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown