filtering a df values within quotes
I am generating a df from command line result with code like below :-
df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))
and tehn converting it into a dataframe :-
df=pd.DataFrame(df_output_lines)
df
the data is in the below format :-
abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc
I want to filter it in a way so that value before :
will be the column name and the values within the quotes " "
be the value and same goes for all columns. The output should be like :-
As of now i am doing it the hard way :-
abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)
and then
abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])
abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])
abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)
Any suggestion for lambda expression or any one liner to do this.
My out put for the raw log is like below :-
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"
python python-3.x pandas dataframe lambda
|
show 3 more comments
I am generating a df from command line result with code like below :-
df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))
and tehn converting it into a dataframe :-
df=pd.DataFrame(df_output_lines)
df
the data is in the below format :-
abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc
I want to filter it in a way so that value before :
will be the column name and the values within the quotes " "
be the value and same goes for all columns. The output should be like :-
As of now i am doing it the hard way :-
abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)
and then
abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])
abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])
abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)
Any suggestion for lambda expression or any one liner to do this.
My out put for the raw log is like below :-
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"
python python-3.x pandas dataframe lambda
1
This looks like you didn't call the correctDataFrame
constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 '18 at 16:15
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
2
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21
|
show 3 more comments
I am generating a df from command line result with code like below :-
df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))
and tehn converting it into a dataframe :-
df=pd.DataFrame(df_output_lines)
df
the data is in the below format :-
abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc
I want to filter it in a way so that value before :
will be the column name and the values within the quotes " "
be the value and same goes for all columns. The output should be like :-
As of now i am doing it the hard way :-
abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)
and then
abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])
abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])
abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)
Any suggestion for lambda expression or any one liner to do this.
My out put for the raw log is like below :-
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"
python python-3.x pandas dataframe lambda
I am generating a df from command line result with code like below :-
df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))
and tehn converting it into a dataframe :-
df=pd.DataFrame(df_output_lines)
df
the data is in the below format :-
abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc
I want to filter it in a way so that value before :
will be the column name and the values within the quotes " "
be the value and same goes for all columns. The output should be like :-
As of now i am doing it the hard way :-
abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)
and then
abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])
abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])
abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)
Any suggestion for lambda expression or any one liner to do this.
My out put for the raw log is like below :-
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"
python python-3.x pandas dataframe lambda
python python-3.x pandas dataframe lambda
edited Nov 19 '18 at 17:53
asked Nov 19 '18 at 16:14
user10177566
1
This looks like you didn't call the correctDataFrame
constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 '18 at 16:15
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
2
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21
|
show 3 more comments
1
This looks like you didn't call the correctDataFrame
constructor. Did you start with a dictionary or json?
– ALollz
Nov 19 '18 at 16:15
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
2
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21
1
1
This looks like you didn't call the correct
DataFrame
constructor. Did you start with a dictionary or json?– ALollz
Nov 19 '18 at 16:15
This looks like you didn't call the correct
DataFrame
constructor. Did you start with a dictionary or json?– ALollz
Nov 19 '18 at 16:15
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
2
2
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21
|
show 3 more comments
3 Answers
3
active
oldest
votes
Feed list of dictionaries to pd.DataFrame
The pd.DataFrame
constructor accepts a list of dictionaries directly. You can use str.rstrip
and str.split
within a list comprehension:
res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
print(res)
id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
It's unclear what logic you use to determine only 'null'
strings are surrounded by parentheses.
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition ofabc
and it works fine for me.
– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, useabc
as you defined in your question.
– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
|
show 1 more comment
Given your example input of:
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
Which is coming from your os.popen
command, then we filter out blank lines and attempt to shlex.split
the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:
import os
import shlex
import pandas as pd
rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]
This'll give you, for example rows[0]
of:
['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']
You then partition those on :
to separate the identifier from the value and feed that into a pd.DataFrame
, eg:
df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)
Giving you a df
of:
channel elapsedtime id instance level message operation time type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in',
cosshlex.split
won't be happy with that... you can try stripping',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 '18 at 17:50
|
show 3 more comments
Though the answer is already produced, However would like to add a regex base approach to achieve the same:
>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
Just applying regex=True
within DataFrame.
>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
OR
# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)
regex explanation:
1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)
2nd Alternative id: id: matches the characters id: literally (case sensitive)
3rd Alternative time: time: matches the character time: literally (case sensitive)
4th Alternative " matches the character " literally (case sensitive)
5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
Feed list of dictionaries to pd.DataFrame
The pd.DataFrame
constructor accepts a list of dictionaries directly. You can use str.rstrip
and str.split
within a list comprehension:
res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
print(res)
id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
It's unclear what logic you use to determine only 'null'
strings are surrounded by parentheses.
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition ofabc
and it works fine for me.
– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, useabc
as you defined in your question.
– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
|
show 1 more comment
Feed list of dictionaries to pd.DataFrame
The pd.DataFrame
constructor accepts a list of dictionaries directly. You can use str.rstrip
and str.split
within a list comprehension:
res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
print(res)
id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
It's unclear what logic you use to determine only 'null'
strings are surrounded by parentheses.
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition ofabc
and it works fine for me.
– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, useabc
as you defined in your question.
– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
|
show 1 more comment
Feed list of dictionaries to pd.DataFrame
The pd.DataFrame
constructor accepts a list of dictionaries directly. You can use str.rstrip
and str.split
within a list comprehension:
res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
print(res)
id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
It's unclear what logic you use to determine only 'null'
strings are surrounded by parentheses.
Feed list of dictionaries to pd.DataFrame
The pd.DataFrame
constructor accepts a list of dictionaries directly. You can use str.rstrip
and str.split
within a list comprehension:
res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
print(res)
id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
It's unclear what logic you use to determine only 'null'
strings are surrounded by parentheses.
answered Nov 19 '18 at 16:22
jppjpp
97.7k2159109
97.7k2159109
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition ofabc
and it works fine for me.
– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, useabc
as you defined in your question.
– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
|
show 1 more comment
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition ofabc
and it works fine for me.
– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, useabc
as you defined in your question.
– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"
– user10177566
Nov 19 '18 at 16:30
@ak333, I'm using your definition of
abc
and it works fine for me.– jpp
Nov 19 '18 at 16:31
@ak333, I'm using your definition of
abc
and it works fine for me.– jpp
Nov 19 '18 at 16:31
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])
– user10177566
Nov 19 '18 at 16:32
@ak333, That's the problem, use
abc
as you defined in your question.– jpp
Nov 19 '18 at 16:35
@ak333, That's the problem, use
abc
as you defined in your question.– jpp
Nov 19 '18 at 16:35
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
I apologize..i got it. @jpp
– user10177566
Nov 19 '18 at 16:38
|
show 1 more comment
Given your example input of:
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
Which is coming from your os.popen
command, then we filter out blank lines and attempt to shlex.split
the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:
import os
import shlex
import pandas as pd
rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]
This'll give you, for example rows[0]
of:
['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']
You then partition those on :
to separate the identifier from the value and feed that into a pd.DataFrame
, eg:
df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)
Giving you a df
of:
channel elapsedtime id instance level message operation time type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in',
cosshlex.split
won't be happy with that... you can try stripping',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 '18 at 17:50
|
show 3 more comments
Given your example input of:
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
Which is coming from your os.popen
command, then we filter out blank lines and attempt to shlex.split
the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:
import os
import shlex
import pandas as pd
rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]
This'll give you, for example rows[0]
of:
['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']
You then partition those on :
to separate the identifier from the value and feed that into a pd.DataFrame
, eg:
df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)
Giving you a df
of:
channel elapsedtime id instance level message operation time type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in',
cosshlex.split
won't be happy with that... you can try stripping',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 '18 at 17:50
|
show 3 more comments
Given your example input of:
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
Which is coming from your os.popen
command, then we filter out blank lines and attempt to shlex.split
the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:
import os
import shlex
import pandas as pd
rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]
This'll give you, for example rows[0]
of:
['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']
You then partition those on :
to separate the identifier from the value and feed that into a pd.DataFrame
, eg:
df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)
Giving you a df
of:
channel elapsedtime id instance level message operation time type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control
Given your example input of:
time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "
time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."
time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"
Which is coming from your os.popen
command, then we filter out blank lines and attempt to shlex.split
the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:
import os
import shlex
import pandas as pd
rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]
This'll give you, for example rows[0]
of:
['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']
You then partition those on :
to separate the identifier from the value and feed that into a pd.DataFrame
, eg:
df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)
Giving you a df
of:
channel elapsedtime id instance level message operation time type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control
answered Nov 19 '18 at 17:23
Jon Clements♦Jon Clements
99k19174219
99k19174219
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in',
cosshlex.split
won't be happy with that... you can try stripping',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 '18 at 17:50
|
show 3 more comments
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in',
cosshlex.split
won't be happy with that... you can try stripping',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...
– Jon Clements♦
Nov 19 '18 at 17:50
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.
– user10177566
Nov 19 '18 at 17:29
1
1
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.
– Jon Clements♦
Nov 19 '18 at 17:31
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.
– user10177566
Nov 19 '18 at 17:34
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines
– user10177566
Nov 19 '18 at 17:40
@ak333 does that line actually end in
',
cos shlex.split
won't be happy with that... you can try stripping ',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...– Jon Clements♦
Nov 19 '18 at 17:50
@ak333 does that line actually end in
',
cos shlex.split
won't be happy with that... you can try stripping ',
from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...– Jon Clements♦
Nov 19 '18 at 17:50
|
show 3 more comments
Though the answer is already produced, However would like to add a regex base approach to achieve the same:
>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
Just applying regex=True
within DataFrame.
>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
OR
# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)
regex explanation:
1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)
2nd Alternative id: id: matches the characters id: literally (case sensitive)
3rd Alternative time: time: matches the character time: literally (case sensitive)
4th Alternative " matches the character " literally (case sensitive)
5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)
add a comment |
Though the answer is already produced, However would like to add a regex base approach to achieve the same:
>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
Just applying regex=True
within DataFrame.
>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
OR
# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)
regex explanation:
1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)
2nd Alternative id: id: matches the characters id: literally (case sensitive)
3rd Alternative time: time: matches the character time: literally (case sensitive)
4th Alternative " matches the character " literally (case sensitive)
5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)
add a comment |
Though the answer is already produced, However would like to add a regex base approach to achieve the same:
>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
Just applying regex=True
within DataFrame.
>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
OR
# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)
regex explanation:
1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)
2nd Alternative id: id: matches the characters id: literally (case sensitive)
3rd Alternative time: time: matches the character time: literally (case sensitive)
4th Alternative " matches the character " literally (case sensitive)
5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)
Though the answer is already produced, However would like to add a regex base approach to achieve the same:
>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
Just applying regex=True
within DataFrame.
>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
OR
# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)
regex explanation:
1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)
2nd Alternative id: id: matches the characters id: literally (case sensitive)
3rd Alternative time: time: matches the character time: literally (case sensitive)
4th Alternative " matches the character " literally (case sensitive)
5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)
edited Nov 19 '18 at 17:31
answered Nov 19 '18 at 16:55
pygopygo
2,8181619
2,8181619
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
This looks like you didn't call the correct
DataFrame
constructor. Did you start with a dictionary or json?– ALollz
Nov 19 '18 at 16:15
this log is being generated on command line and i am capturing it in a data farme with code
– user10177566
Nov 19 '18 at 16:17
2
I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...
– Jon Clements♦
Nov 19 '18 at 16:20
@JonClements i have edited my question .
– user10177566
Nov 19 '18 at 16:21
@ALollz i have edited my question.
– user10177566
Nov 19 '18 at 16:21