filtering a df values within quotes












1















I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question




















  • 1





    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

    – ALollz
    Nov 19 '18 at 16:15











  • this log is being generated on command line and i am capturing it in a data farme with code

    – user10177566
    Nov 19 '18 at 16:17






  • 2





    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

    – Jon Clements
    Nov 19 '18 at 16:20











  • @JonClements i have edited my question .

    – user10177566
    Nov 19 '18 at 16:21











  • @ALollz i have edited my question.

    – user10177566
    Nov 19 '18 at 16:21
















1















I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question




















  • 1





    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

    – ALollz
    Nov 19 '18 at 16:15











  • this log is being generated on command line and i am capturing it in a data farme with code

    – user10177566
    Nov 19 '18 at 16:17






  • 2





    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

    – Jon Clements
    Nov 19 '18 at 16:20











  • @JonClements i have edited my question .

    – user10177566
    Nov 19 '18 at 16:21











  • @ALollz i have edited my question.

    – user10177566
    Nov 19 '18 at 16:21














1












1








1








I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"









share|improve this question
















I am generating a df from command line result with code like below :-



df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]
df_output_lines = list(filter(None, df_output_lines))


and tehn converting it into a dataframe :-



df=pd.DataFrame(df_output_lines)
df


the data is in the below format :-



abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])
abc


enter image description here



I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here



As of now i am doing it the hard way :-



abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)


and then



abc['time'] = abc['time'].map(lambda x: str(x)[:-1])
abc['time'] = abc['time'].map(lambda x: str(x)[6:])

abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])
abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])

abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)


Any suggestion for lambda expression or any one liner to do this.



My out put for the raw log is like below :-



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"






python python-3.x pandas dataframe lambda






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 '18 at 17:53

























asked Nov 19 '18 at 16:14







user10177566















  • 1





    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

    – ALollz
    Nov 19 '18 at 16:15











  • this log is being generated on command line and i am capturing it in a data farme with code

    – user10177566
    Nov 19 '18 at 16:17






  • 2





    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

    – Jon Clements
    Nov 19 '18 at 16:20











  • @JonClements i have edited my question .

    – user10177566
    Nov 19 '18 at 16:21











  • @ALollz i have edited my question.

    – user10177566
    Nov 19 '18 at 16:21














  • 1





    This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

    – ALollz
    Nov 19 '18 at 16:15











  • this log is being generated on command line and i am capturing it in a data farme with code

    – user10177566
    Nov 19 '18 at 16:17






  • 2





    I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

    – Jon Clements
    Nov 19 '18 at 16:20











  • @JonClements i have edited my question .

    – user10177566
    Nov 19 '18 at 16:21











  • @ALollz i have edited my question.

    – user10177566
    Nov 19 '18 at 16:21








1




1





This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15





This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15













this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17





this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17




2




2





I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements
Nov 19 '18 at 16:20





I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements
Nov 19 '18 at 16:20













@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21





@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21













@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21





@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21












3 Answers
3






active

oldest

votes


















0














Feed list of dictionaries to pd.DataFrame



The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

print(res)

id instance time
0 3214039276626790405 (null) 08:59:38.000
1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






share|improve this answer
























  • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

    – user10177566
    Nov 19 '18 at 16:30











  • @ak333, I'm using your definition of abc and it works fine for me.

    – jpp
    Nov 19 '18 at 16:31











  • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    – user10177566
    Nov 19 '18 at 16:32











  • @ak333, That's the problem, use abc as you defined in your question.

    – jpp
    Nov 19 '18 at 16:35











  • I apologize..i got it. @jpp

    – user10177566
    Nov 19 '18 at 16:38



















0














Given your example input of:



time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



import os
import shlex
import pandas as pd

rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


This'll give you, for example rows[0] of:



['time:11:22:20.000',
'instance:(null)',
'id:723927731576482920',
'channel:sip:confctl.com',
'type:control',
'elapsedtime:0.000631',
'level:info',
'operation:Init',
'message:Initialize (version 4.9.0002.30618) ... ']


You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


Giving you a df of:



            channel elapsedtime                  id               instance  level                                            message operation          time     type
0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





share|improve this answer
























  • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

    – user10177566
    Nov 19 '18 at 17:29






  • 1





    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

    – Jon Clements
    Nov 19 '18 at 17:31











  • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

    – user10177566
    Nov 19 '18 at 17:34











  • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

    – user10177566
    Nov 19 '18 at 17:40











  • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

    – Jon Clements
    Nov 19 '18 at 17:50



















0














Though the answer is already produced, However would like to add a regex base approach to achieve the same:



>>> abc
time instance id
0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


Just applying regex=True within DataFrame.



>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
time instance id
0 08:59:38.000 null 3214039276626790405
1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

OR

# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


regex explanation:





  • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


  • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


  • 3rd Alternative time: time: matches the character time: literally (case sensitive)


  • 4th Alternative " matches the character " literally (case sensitive)


  • 5th Alternative [()]' Match a single character present in the list below [()]
    () matches a single character in the list () (case sensitive)








share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown
























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer
























    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

      – user10177566
      Nov 19 '18 at 16:30











    • @ak333, I'm using your definition of abc and it works fine for me.

      – jpp
      Nov 19 '18 at 16:31











    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

      – user10177566
      Nov 19 '18 at 16:32











    • @ak333, That's the problem, use abc as you defined in your question.

      – jpp
      Nov 19 '18 at 16:35











    • I apologize..i got it. @jpp

      – user10177566
      Nov 19 '18 at 16:38
















    0














    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer
























    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

      – user10177566
      Nov 19 '18 at 16:30











    • @ak333, I'm using your definition of abc and it works fine for me.

      – jpp
      Nov 19 '18 at 16:31











    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

      – user10177566
      Nov 19 '18 at 16:32











    • @ak333, That's the problem, use abc as you defined in your question.

      – jpp
      Nov 19 '18 at 16:35











    • I apologize..i got it. @jpp

      – user10177566
      Nov 19 '18 at 16:38














    0












    0








    0







    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.






    share|improve this answer













    Feed list of dictionaries to pd.DataFrame



    The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:



    res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    print(res)

    id instance time
    0 3214039276626790405 (null) 08:59:38.000
    1 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000
    2 3214039276626790405 (Ops-MacBook-Pro.local) 08:59:38.000


    It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 19 '18 at 16:22









    jppjpp

    97.7k2159109




    97.7k2159109













    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

      – user10177566
      Nov 19 '18 at 16:30











    • @ak333, I'm using your definition of abc and it works fine for me.

      – jpp
      Nov 19 '18 at 16:31











    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

      – user10177566
      Nov 19 '18 at 16:32











    • @ak333, That's the problem, use abc as you defined in your question.

      – jpp
      Nov 19 '18 at 16:35











    • I apologize..i got it. @jpp

      – user10177566
      Nov 19 '18 at 16:38



















    • its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

      – user10177566
      Nov 19 '18 at 16:30











    • @ak333, I'm using your definition of abc and it works fine for me.

      – jpp
      Nov 19 '18 at 16:31











    • i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

      – user10177566
      Nov 19 '18 at 16:32











    • @ak333, That's the problem, use abc as you defined in your question.

      – jpp
      Nov 19 '18 at 16:35











    • I apologize..i got it. @jpp

      – user10177566
      Nov 19 '18 at 16:38

















    its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

    – user10177566
    Nov 19 '18 at 16:30





    its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

    – user10177566
    Nov 19 '18 at 16:30













    @ak333, I'm using your definition of abc and it works fine for me.

    – jpp
    Nov 19 '18 at 16:31





    @ak333, I'm using your definition of abc and it works fine for me.

    – jpp
    Nov 19 '18 at 16:31













    i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    – user10177566
    Nov 19 '18 at 16:32





    i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

    – user10177566
    Nov 19 '18 at 16:32













    @ak333, That's the problem, use abc as you defined in your question.

    – jpp
    Nov 19 '18 at 16:35





    @ak333, That's the problem, use abc as you defined in your question.

    – jpp
    Nov 19 '18 at 16:35













    I apologize..i got it. @jpp

    – user10177566
    Nov 19 '18 at 16:38





    I apologize..i got it. @jpp

    – user10177566
    Nov 19 '18 at 16:38













    0














    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer
























    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

      – user10177566
      Nov 19 '18 at 17:29






    • 1





      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

      – Jon Clements
      Nov 19 '18 at 17:31











    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

      – user10177566
      Nov 19 '18 at 17:34











    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

      – user10177566
      Nov 19 '18 at 17:40











    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

      – Jon Clements
      Nov 19 '18 at 17:50
















    0














    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer
























    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

      – user10177566
      Nov 19 '18 at 17:29






    • 1





      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

      – Jon Clements
      Nov 19 '18 at 17:31











    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

      – user10177566
      Nov 19 '18 at 17:34











    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

      – user10177566
      Nov 19 '18 at 17:40











    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

      – Jon Clements
      Nov 19 '18 at 17:50














    0












    0








    0







    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control





    share|improve this answer













    Given your example input of:



    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "

    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."

    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"


    Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:



    import os
    import shlex
    import pandas as pd

    rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]


    This'll give you, for example rows[0] of:



    ['time:11:22:20.000',
    'instance:(null)',
    'id:723927731576482920',
    'channel:sip:confctl.com',
    'type:control',
    'elapsedtime:0.000631',
    'level:info',
    'operation:Init',
    'message:Initialize (version 4.9.0002.30618) ... ']


    You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:



    df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)


    Giving you a df of:



                channel elapsedtime                  id               instance  level                                            message operation          time     type
    0 sip:confctl.com 0.000631 723927731576482920 (null) info Initialize (version 4.9.0002.30618) ... Init 11:22:20.000 control
    1 sip:confctl.com 0.067122 723927731576482920 Ops-MacBook-Pro.local info Connecting to https://hrpd.www.vivox.com/api2/ Connect 11:22:21.000 control
    2 sip:confctl-.com 2.685700 723927731576482920 Ops-MacBook-Pro.local info Connected to https://hrpd.www.vivox.com/api2/ Connect 11:22:23.000 control
    3 sip:confctl-.com 2.814268 723927731576482920 Ops-MacBook-Pro.local info Logged in .tester_food. Login 11:22:23.000 control
    4 sip:confctl-.com 2.912255 723927731576482920 Ops-MacBook-Pro.local error .tester_food. failed to join sip:confctl-2@hrp... Call 11:22:23.000 control






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 19 '18 at 17:23









    Jon ClementsJon Clements

    99k19174219




    99k19174219













    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

      – user10177566
      Nov 19 '18 at 17:29






    • 1





      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

      – Jon Clements
      Nov 19 '18 at 17:31











    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

      – user10177566
      Nov 19 '18 at 17:34











    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

      – user10177566
      Nov 19 '18 at 17:40











    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

      – Jon Clements
      Nov 19 '18 at 17:50



















    • this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

      – user10177566
      Nov 19 '18 at 17:29






    • 1





      @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

      – Jon Clements
      Nov 19 '18 at 17:31











    • I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

      – user10177566
      Nov 19 '18 at 17:34











    • I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

      – user10177566
      Nov 19 '18 at 17:40











    • @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

      – Jon Clements
      Nov 19 '18 at 17:50

















    this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

    – user10177566
    Nov 19 '18 at 17:29





    this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

    – user10177566
    Nov 19 '18 at 17:29




    1




    1





    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

    – Jon Clements
    Nov 19 '18 at 17:31





    @ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

    – Jon Clements
    Nov 19 '18 at 17:31













    I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

    – user10177566
    Nov 19 '18 at 17:34





    I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

    – user10177566
    Nov 19 '18 at 17:34













    I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

    – user10177566
    Nov 19 '18 at 17:40





    I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

    – user10177566
    Nov 19 '18 at 17:40













    @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

    – Jon Clements
    Nov 19 '18 at 17:50





    @ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

    – Jon Clements
    Nov 19 '18 at 17:50











    0














    Though the answer is already produced, However would like to add a regex base approach to achieve the same:



    >>> abc
    time instance id
    0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
    1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
    2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


    Just applying regex=True within DataFrame.



    >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
    time instance id
    0 08:59:38.000 null 3214039276626790405
    1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
    2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

    OR

    # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


    regex explanation:





    • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


    • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


    • 3rd Alternative time: time: matches the character time: literally (case sensitive)


    • 4th Alternative " matches the character " literally (case sensitive)


    • 5th Alternative [()]' Match a single character present in the list below [()]
      () matches a single character in the list () (case sensitive)








    share|improve this answer






























      0














      Though the answer is already produced, However would like to add a regex base approach to achieve the same:



      >>> abc
      time instance id
      0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
      1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
      2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


      Just applying regex=True within DataFrame.



      >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
      time instance id
      0 08:59:38.000 null 3214039276626790405
      1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
      2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

      OR

      # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


      regex explanation:





      • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


      • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


      • 3rd Alternative time: time: matches the character time: literally (case sensitive)


      • 4th Alternative " matches the character " literally (case sensitive)


      • 5th Alternative [()]' Match a single character present in the list below [()]
        () matches a single character in the list () (case sensitive)








      share|improve this answer




























        0












        0








        0







        Though the answer is already produced, However would like to add a regex base approach to achieve the same:



        >>> abc
        time instance id
        0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
        1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
        2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


        Just applying regex=True within DataFrame.



        >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
        time instance id
        0 08:59:38.000 null 3214039276626790405
        1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
        2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

        OR

        # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


        regex explanation:





        • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


        • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


        • 3rd Alternative time: time: matches the character time: literally (case sensitive)


        • 4th Alternative " matches the character " literally (case sensitive)


        • 5th Alternative [()]' Match a single character present in the list below [()]
          () matches a single character in the list () (case sensitive)








        share|improve this answer















        Though the answer is already produced, However would like to add a regex base approach to achieve the same:



        >>> abc
        time instance id
        0 time:"08:59:38.000" instance:"(null)" id:"3214039276626790405"
        1 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"
        2 time:"08:59:38.000" instance:"(Ops-MacBook-Pro.local)" id:"3214039276626790405"


        Just applying regex=True within DataFrame.



        >>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)
        time instance id
        0 08:59:38.000 null 3214039276626790405
        1 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405
        2 08:59:38.000 Ops-MacBook-Pro.local 3214039276626790405

        OR

        # abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)


        regex explanation:





        • 1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)


        • 2nd Alternative id: id: matches the characters id: literally (case sensitive)


        • 3rd Alternative time: time: matches the character time: literally (case sensitive)


        • 4th Alternative " matches the character " literally (case sensitive)


        • 5th Alternative [()]' Match a single character present in the list below [()]
          () matches a single character in the list () (case sensitive)









        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 19 '18 at 17:31

























        answered Nov 19 '18 at 16:55









        pygopygo

        2,8181619




        2,8181619






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            How to change which sound is reproduced for terminal bell?

            Can I use Tabulator js library in my java Spring + Thymeleaf project?

            Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents