Python: cannot get original format after read .txt file into dataframe and write back to file
up vote
2
down vote
favorite
I have a .txt file and I need to do outlier removal on. The file looks like this:
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}
................(there are many more lines like this in each file and I have several files)
(Note that there was no blank space between each two {} in the original text file.)
I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.
here is my code:
path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
filelist = glob.glob(path, recursive = True)
for i in range(0,3):
df = pd.read_json(filelist[i], lines=True)
outlier_x = df['x'].mean() + df['x'].std() * 3
outlier_x2 = df['x'].mean() - df['x'].std() * 3
outlier_y = df['y'].mean() + df['y'].std() * 3
outlier_y2 = df['y'].mean() - df['y'].std() * 3
outlier_vx = df['vx'].mean() + df['vx'].std() * 3
outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
outlier_vy = df['vy'].mean() + df['vy'].std() * 3
outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
outlier_pr = df['press'].mean() + df['press'].std() * 3
outlier_pr2 = df['press'].mean() - df['press'].std() * 3
outlier_sz = df['size'].mean() + df['size'].std() * 3
outlier_sz2 = df['size'].mean() - df['size'].std() * 3
df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
inplace = True)
df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]
# remove ouliers for column 'x'
df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
# remove ouliers for column 'y'
df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
# remove part of the infinite values from column 'vx'
df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
# replace infinit with NAN
df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
# remove ouliers from column 'vx'
df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
# replace infinit with NAN
df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
# fill na with '0' in columns 'vx'
df['vx'] = df['vx'].fillna(0.0)
# fill na with '0' in columns 'vy'
df['vy'] = df['vy'].fillna(0.0)
# remove outliers from column 'vy'
df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
# remove outliers from column 'press'
df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
'MOVE'))].index)
# remove outliers from column 'size'
df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)
df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan
col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']
# modify dataframe to propriate json format
jsonresult = df.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)
# remove some null values
new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
row) and pd.notnull(row[key])])for row in rows]
jsonfile = json.dump(new_rows)
#save them into destination
outfile = "c:/Users/USER/.spyder-py3/machine-
learning/data2/testresult/user_" + str(i) + "_mod6.txt"
thefile = open(outfile, 'w')
json_output = jsonfile.strip("").split('},')
for i in range(len(json_output)):
json_output[i] = json_output[i] + '}'
for item in json_output:
thefile.write("%sn" % item)
I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-7-a2c25911084b> in <module>()
2321 print('-----------------------test where am I--------------------------------')
2322 for line in file_object:
-> 2323 jrecord = json.loads(line)
2324 try:
2325 typ = jrecord['type']
~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
~Anaconda3libjsondecoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344
JSONDecodeError: Extra data: line 1 column 201 (char 200)
There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!
python json readfile
add a comment |
up vote
2
down vote
favorite
I have a .txt file and I need to do outlier removal on. The file looks like this:
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}
................(there are many more lines like this in each file and I have several files)
(Note that there was no blank space between each two {} in the original text file.)
I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.
here is my code:
path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
filelist = glob.glob(path, recursive = True)
for i in range(0,3):
df = pd.read_json(filelist[i], lines=True)
outlier_x = df['x'].mean() + df['x'].std() * 3
outlier_x2 = df['x'].mean() - df['x'].std() * 3
outlier_y = df['y'].mean() + df['y'].std() * 3
outlier_y2 = df['y'].mean() - df['y'].std() * 3
outlier_vx = df['vx'].mean() + df['vx'].std() * 3
outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
outlier_vy = df['vy'].mean() + df['vy'].std() * 3
outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
outlier_pr = df['press'].mean() + df['press'].std() * 3
outlier_pr2 = df['press'].mean() - df['press'].std() * 3
outlier_sz = df['size'].mean() + df['size'].std() * 3
outlier_sz2 = df['size'].mean() - df['size'].std() * 3
df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
inplace = True)
df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]
# remove ouliers for column 'x'
df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
# remove ouliers for column 'y'
df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
# remove part of the infinite values from column 'vx'
df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
# replace infinit with NAN
df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
# remove ouliers from column 'vx'
df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
# replace infinit with NAN
df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
# fill na with '0' in columns 'vx'
df['vx'] = df['vx'].fillna(0.0)
# fill na with '0' in columns 'vy'
df['vy'] = df['vy'].fillna(0.0)
# remove outliers from column 'vy'
df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
# remove outliers from column 'press'
df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
'MOVE'))].index)
# remove outliers from column 'size'
df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)
df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan
col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']
# modify dataframe to propriate json format
jsonresult = df.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)
# remove some null values
new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
row) and pd.notnull(row[key])])for row in rows]
jsonfile = json.dump(new_rows)
#save them into destination
outfile = "c:/Users/USER/.spyder-py3/machine-
learning/data2/testresult/user_" + str(i) + "_mod6.txt"
thefile = open(outfile, 'w')
json_output = jsonfile.strip("").split('},')
for i in range(len(json_output)):
json_output[i] = json_output[i] + '}'
for item in json_output:
thefile.write("%sn" % item)
I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-7-a2c25911084b> in <module>()
2321 print('-----------------------test where am I--------------------------------')
2322 for line in file_object:
-> 2323 jrecord = json.loads(line)
2324 try:
2325 typ = jrecord['type']
~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
~Anaconda3libjsondecoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344
JSONDecodeError: Extra data: line 1 column 201 (char 200)
There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!
python json readfile
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have a .txt file and I need to do outlier removal on. The file looks like this:
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}
................(there are many more lines like this in each file and I have several files)
(Note that there was no blank space between each two {} in the original text file.)
I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.
here is my code:
path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
filelist = glob.glob(path, recursive = True)
for i in range(0,3):
df = pd.read_json(filelist[i], lines=True)
outlier_x = df['x'].mean() + df['x'].std() * 3
outlier_x2 = df['x'].mean() - df['x'].std() * 3
outlier_y = df['y'].mean() + df['y'].std() * 3
outlier_y2 = df['y'].mean() - df['y'].std() * 3
outlier_vx = df['vx'].mean() + df['vx'].std() * 3
outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
outlier_vy = df['vy'].mean() + df['vy'].std() * 3
outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
outlier_pr = df['press'].mean() + df['press'].std() * 3
outlier_pr2 = df['press'].mean() - df['press'].std() * 3
outlier_sz = df['size'].mean() + df['size'].std() * 3
outlier_sz2 = df['size'].mean() - df['size'].std() * 3
df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
inplace = True)
df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]
# remove ouliers for column 'x'
df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
# remove ouliers for column 'y'
df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
# remove part of the infinite values from column 'vx'
df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
# replace infinit with NAN
df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
# remove ouliers from column 'vx'
df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
# replace infinit with NAN
df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
# fill na with '0' in columns 'vx'
df['vx'] = df['vx'].fillna(0.0)
# fill na with '0' in columns 'vy'
df['vy'] = df['vy'].fillna(0.0)
# remove outliers from column 'vy'
df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
# remove outliers from column 'press'
df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
'MOVE'))].index)
# remove outliers from column 'size'
df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)
df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan
col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']
# modify dataframe to propriate json format
jsonresult = df.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)
# remove some null values
new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
row) and pd.notnull(row[key])])for row in rows]
jsonfile = json.dump(new_rows)
#save them into destination
outfile = "c:/Users/USER/.spyder-py3/machine-
learning/data2/testresult/user_" + str(i) + "_mod6.txt"
thefile = open(outfile, 'w')
json_output = jsonfile.strip("").split('},')
for i in range(len(json_output)):
json_output[i] = json_output[i] + '}'
for item in json_output:
thefile.write("%sn" % item)
I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-7-a2c25911084b> in <module>()
2321 print('-----------------------test where am I--------------------------------')
2322 for line in file_object:
-> 2323 jrecord = json.loads(line)
2324 try:
2325 typ = jrecord['type']
~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
~Anaconda3libjsondecoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344
JSONDecodeError: Extra data: line 1 column 201 (char 200)
There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!
python json readfile
I have a .txt file and I need to do outlier removal on. The file looks like this:
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}
................(there are many more lines like this in each file and I have several files)
(Note that there was no blank space between each two {} in the original text file.)
I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.
here is my code:
path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
filelist = glob.glob(path, recursive = True)
for i in range(0,3):
df = pd.read_json(filelist[i], lines=True)
outlier_x = df['x'].mean() + df['x'].std() * 3
outlier_x2 = df['x'].mean() - df['x'].std() * 3
outlier_y = df['y'].mean() + df['y'].std() * 3
outlier_y2 = df['y'].mean() - df['y'].std() * 3
outlier_vx = df['vx'].mean() + df['vx'].std() * 3
outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
outlier_vy = df['vy'].mean() + df['vy'].std() * 3
outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
outlier_pr = df['press'].mean() + df['press'].std() * 3
outlier_pr2 = df['press'].mean() - df['press'].std() * 3
outlier_sz = df['size'].mean() + df['size'].std() * 3
outlier_sz2 = df['size'].mean() - df['size'].std() * 3
df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
inplace = True)
df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]
# remove ouliers for column 'x'
df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
# remove ouliers for column 'y'
df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
# remove part of the infinite values from column 'vx'
df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
# replace infinit with NAN
df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
# remove ouliers from column 'vx'
df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
# replace infinit with NAN
df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
# fill na with '0' in columns 'vx'
df['vx'] = df['vx'].fillna(0.0)
# fill na with '0' in columns 'vy'
df['vy'] = df['vy'].fillna(0.0)
# remove outliers from column 'vy'
df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
# remove outliers from column 'press'
df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
'MOVE'))].index)
# remove outliers from column 'size'
df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)
df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan
col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']
# modify dataframe to propriate json format
jsonresult = df.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)
# remove some null values
new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
row) and pd.notnull(row[key])])for row in rows]
jsonfile = json.dump(new_rows)
#save them into destination
outfile = "c:/Users/USER/.spyder-py3/machine-
learning/data2/testresult/user_" + str(i) + "_mod6.txt"
thefile = open(outfile, 'w')
json_output = jsonfile.strip("").split('},')
for i in range(len(json_output)):
json_output[i] = json_output[i] + '}'
for item in json_output:
thefile.write("%sn" % item)
I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-7-a2c25911084b> in <module>()
2321 print('-----------------------test where am I--------------------------------')
2322 for line in file_object:
-> 2323 jrecord = json.loads(line)
2324 try:
2325 typ = jrecord['type']
~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder
~Anaconda3libjsondecoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344
JSONDecodeError: Extra data: line 1 column 201 (char 200)
There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!
python json readfile
python json readfile
edited Nov 13 at 18:07
Vasilis G.
2,8952721
2,8952721
asked Nov 13 at 17:56
Leran
166
166
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
add a comment |
up vote
1
down vote
accepted
You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)
You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)
answered Nov 13 at 18:17
Kirk
547414
547414
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
add a comment |
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286920%2fpython-cannot-get-original-format-after-read-txt-file-into-dataframe-and-write%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown