filtering a df values within quotes

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

I want to filter it in a way so that value before : will be the column name and the values within the quotes " " be the value and same goes for all columns. The output should be like :-
enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements♦
Nov 19 '18 at 16:20

@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21

@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21

|
show 3 more comments

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements♦
Nov 19 '18 at 16:20

@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21

@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21

|
show 3 more comments

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

I am generating a df from command line result with code like below :-

df_output_lines = [s.split() for s in os.popen("my command linecode").read().splitlines()]

df_output_lines  = list(filter(None, df_output_lines))

and tehn converting it into a dataframe :-

df=pd.DataFrame(df_output_lines)

df

the data is in the below format :-

abc = pd.DataFrame([['time:"08:59:38.000"', 'instance:"(null)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"'],['time:"08:59:38.000"', 'instance:"(Ops-MacBook-Pro.local)"','id:"3214039276626790405"']])

abc

enter image description here

As of now i am doing it the hard way :-

abc.rename(columns={0:'time',1:'instance',2:'id'},inplace=True)

and then

abc['time'] = abc['time'].map(lambda x: str(x)[:-1])

abc['time'] = abc['time'].map(lambda x: str(x)[6:])



abc['instance'] = abc['instance'].map(lambda x: str(x)[:-1])

abc['instance'] = abc['instance'].map(lambda x: str(x)[10:])



abc['id'] = abc.id.str.extract('(d+)', expand=True).astype(int)

Any suggestion for lambda expression or any one liner to do this.

My out put for the raw log is like below :-

    time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



    time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



    time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"



 time:"12:30:41.000" instance:"Ops-MacBook-Pro.local" id:"10316899144153251411" channel:"sip:confctl-2@hrpd.vivox.com" type:"media" sampleperiod:"0.000000" incomingpktsreceived:"0" incomingpktsexpected:"0" incomingpktsloss:"0" incomingpktssoutoftime:"0" incomingpktsdiscarded:"0" outgoingpktssent:"0" predictedmos:"3" latencypktssent:"0" latencycount:"0" latencysum:"0.000000" latencymin:"0.000000" latencymax:"0.000000" callid:"2477580077" r_factor:"0.000000"

python python-3.x pandas dataframe lambda

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

edited Nov 19 '18 at 17:53

asked Nov 19 '18 at 16:14

user10177566

asked Nov 19 '18 at 16:14

user10177566

asked Nov 19 '18 at 16:14

user10177566

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements♦
Nov 19 '18 at 16:20

@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21

@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21

|
show 3 more comments

1

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17

2

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements♦
Nov 19 '18 at 16:20

@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21

@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21

This looks like you didn't call the correct DataFrame constructor. Did you start with a dictionary or json?

– ALollz
Nov 19 '18 at 16:15

this log is being generated on command line and i am capturing it in a data farme with code

– user10177566
Nov 19 '18 at 16:17

I'm thinking the same as @ALollz here... what does a few lines of your raw log file look like? Loading it in a different way from the start is likely to be much easier and reliable...

– Jon Clements♦
Nov 19 '18 at 16:20

@JonClements i have edited my question .

– user10177566
Nov 19 '18 at 16:21

@ALollz i have edited my question.

– user10177566
Nov 19 '18 at 16:21

|
show 3 more comments

3 Answers
3

active

oldest

votes

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

|
show 1 more comment

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

Which is coming from your os.popen command, then we filter out blank lines and attempt to shlex.split the line so that whitespace in quoted items is preserved (but the quotes themselves are removed), eg:

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

|
show 3 more comments

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378687%2ffiltering-a-df-values-within-quotes%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

|
show 1 more comment

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

|
show 1 more comment

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

Feed list of dictionaries to `pd.DataFrame`

The pd.DataFrame constructor accepts a list of dictionaries directly. You can use str.rstrip and str.split within a list comprehension:

res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])



print(res)



                    id                 instance          time

0  3214039276626790405                   (null)  08:59:38.000

1  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

2  3214039276626790405  (Ops-MacBook-Pro.local)  08:59:38.000

It's unclear what logic you use to determine only 'null' strings are surrounded by parentheses.

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

answered Nov 19 '18 at 16:22

jpp

97.7k2159109

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

|
show 1 more comment

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

its giving me an error "dictionary update sequence element #0 has length 1; 2 is required"

– user10177566
Nov 19 '18 at 16:30

@ak333, I'm using your definition of abc and it works fine for me.

– jpp
Nov 19 '18 at 16:31

i am using abc = pd.DataFrame([['08:59:38.000', '(null)','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405'],['08:59:38.000', 'Ops-MacBook-Pro.local','3214039276626790405']]) res = pd.DataFrame([dict(i.rstrip('"').split(':"') for i in row) for row in abc.values])

– user10177566
Nov 19 '18 at 16:32

@ak333, That's the problem, use abc as you defined in your question.

– jpp
Nov 19 '18 at 16:35

I apologize..i got it. @jpp

– user10177566
Nov 19 '18 at 16:38

|
show 1 more comment

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

|
show 3 more comments

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

|
show 3 more comments

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

Given your example input of:

time:"11:22:20.000" instance:"(null)" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.000631" level:"info" operation:"Init" message:"Initialize (version 4.9.0002.30618) ... "



time:"11:22:21.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl.com" type:"control" elapsedtime:"0.067122" level:"info" operation:"Connect" message:"Connecting to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.685700" level:"info" operation:"Connect" message:"Connected to https://hrpd.www.vivox.com/api2/"



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.814268" level:"info" operation:"Login" message:"Logged in .tester_food."



time:"11:22:23.000" instance:"Ops-MacBook-Pro.local" id:"723927731576482920" channel:"sip:confctl-.com" type:"control" elapsedtime:"2.912255" level:"error" operation:"Call" message:".tester_food. failed to join sip:confctl-2@hrpd.vivox.com error:Access token has invalid signature(403)"

import os

import shlex

import pandas as pd



rows = [shlex.split(line) for line in os.popen("my command linecode").read().splitlines() if line.strip()]

This'll give you, for example rows[0] of:

['time:11:22:20.000',

 'instance:(null)',

 'id:723927731576482920',

 'channel:sip:confctl.com',

 'type:control',

 'elapsedtime:0.000631',

 'level:info',

 'operation:Init',

 'message:Initialize (version 4.9.0002.30618) ... ']

You then partition those on : to separate the identifier from the value and feed that into a pd.DataFrame, eg:

df = pd.DataFrame(dict(col.partition(':')[::2] for col in row) for row in rows)

Giving you a df of:

            channel elapsedtime                  id               instance  level                                            message operation          time     type

0   sip:confctl.com    0.000631  723927731576482920                 (null)   info           Initialize (version 4.9.0002.30618) ...       Init  11:22:20.000  control

1   sip:confctl.com    0.067122  723927731576482920  Ops-MacBook-Pro.local   info     Connecting to https://hrpd.www.vivox.com/api2/   Connect  11:22:21.000  control

2  sip:confctl-.com    2.685700  723927731576482920  Ops-MacBook-Pro.local   info      Connected to https://hrpd.www.vivox.com/api2/   Connect  11:22:23.000  control

3  sip:confctl-.com    2.814268  723927731576482920  Ops-MacBook-Pro.local   info                            Logged in .tester_food.     Login  11:22:23.000  control

4  sip:confctl-.com    2.912255  723927731576482920  Ops-MacBook-Pro.local  error  .tester_food. failed to join sip:confctl-2@hrp...      Call  11:22:23.000  control

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

answered Nov 19 '18 at 17:23

Jon Clements♦

99k19174219

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

|
show 3 more comments

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

1

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

this is exactly i was looking for but i have two kinds of logs the one which i gave you is havign limited lines i mean ther ei sjust one different line in that log which is causing me issue can you help me with that if i will edit that in the question.

– user10177566
Nov 19 '18 at 17:29

@ak333 you can... but make sure adding that doesn't change the meaning of your question and invalidate the answers given please.

– Jon Clements♦
Nov 19 '18 at 17:31

I have added that.... i am getting kind of list of lines on command prompt and i am storing that using popen. Just that one long line which is giving me the stats causing the whole trouble as its different than all other lines.

– user10177566
Nov 19 '18 at 17:34

I hope i edited as per your expectation. I don't mind getting two dataframes one for the line which i addded and rest for the other lines

– user10177566
Nov 19 '18 at 17:40

@ak333 does that line actually end in ', cos shlex.split won't be happy with that... you can try stripping ', from each line before splitting them and seeing if that works.... that might give you one big df and then you filter out on columns that you know only belong to those entry types etc...

– Jon Clements♦
Nov 19 '18 at 17:50

|
show 3 more comments

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

add a comment |

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

add a comment |

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

Though the answer is already produced, However would like to add a regex base approach to achieve the same:

>>> abc

                  time                            instance                        id

0  time:"08:59:38.000"                   instance:"(null)"  id:"3214039276626790405"

1  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

2  time:"08:59:38.000"  instance:"(Ops-MacBook-Pro.local)"  id:"3214039276626790405"

Just applying regex=True within DataFrame.

>>> abc.replace('instance:|id:|time:|"|[()]', '',regex=True)

           time               instance                   id

0  08:59:38.000                   null  3214039276626790405

1  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405

2  08:59:38.000  Ops-MacBook-Pro.local  3214039276626790405



OR   



# abc.replace('(instance:|id:|time:)|"|[()]', '',regex=True)

regex explanation:

1st Alternative 'instance: 'instance: matches the characters 'instance: literally (case sensitive)

2nd Alternative id: id: matches the characters id: literally (case sensitive)

3rd Alternative time: time: matches the character time: literally (case sensitive)

4th Alternative " matches the character " literally (case sensitive)

5th Alternative [()]' Match a single character present in the list below [()]
() matches a single character in the list () (case sensitive)

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

edited Nov 19 '18 at 17:31

answered Nov 19 '18 at 16:55

pygo

2,8181619

answered Nov 19 '18 at 16:55

pygo

2,8181619

answered Nov 19 '18 at 16:55

pygo

2,8181619

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky