Python: cannot get original format after read .txt file into dataframe and write back to file

up vote
2
down vote

favorite

I have a .txt file and I need to do outlier removal on. The file looks like this:

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}

{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}

................(there are many more lines like this in each file and I have several files)

(Note that there was no blank space between each two {} in the original text file.)

I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.

here is my code:

path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'

filelist = glob.glob(path, recursive = True)



for i in range(0,3):

    df = pd.read_json(filelist[i], lines=True)



    outlier_x = df['x'].mean() + df['x'].std() * 3

    outlier_x2 = df['x'].mean() - df['x'].std() * 3

    outlier_y = df['y'].mean() + df['y'].std() * 3

    outlier_y2 = df['y'].mean() - df['y'].std() * 3

    outlier_vx = df['vx'].mean() + df['vx'].std() * 3

    outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3

    outlier_vy = df['vy'].mean() + df['vy'].std() * 3

    outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3

    outlier_pr = df['press'].mean() + df['press'].std() * 3

    outlier_pr2 = df['press'].mean() - df['press'].std() * 3

    outlier_sz = df['size'].mean() + df['size'].std() * 3

    outlier_sz2 = df['size'].mean() - df['size'].std() * 3



    df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1, 

    inplace = True)

    df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',

          'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]





    # remove ouliers for column 'x'

    df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)

    # remove ouliers for column 'y'

    df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)

    # remove part of the infinite values from column 'vx'

    df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)

    df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)

    # replace infinit with NAN

    df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())

    # remove ouliers from column 'vx'  

    df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)

    # replace infinit with NAN

    df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())

    # fill na with '0' in columns 'vx'

    df['vx'] = df['vx'].fillna(0.0)

     # fill na with '0' in columns 'vy'

    df['vy'] = df['vy'].fillna(0.0)

    # remove outliers from column 'vy'

    df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)

    # remove outliers from column 'press'

    df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] == 

    'MOVE'))].index)

    # remove outliers from column 'size'

    df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)



    df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan



    col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',

                 'pcnt','pid','pidx','act','x','y','size','press','vx','vy']



     # modify dataframe to propriate json format

    jsonresult = df.to_json(orient='records')

     # read the json string to get a list of dictionaries

    rows = json.loads(jsonresult)







   # remove some null values

    new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in 

    row) and pd.notnull(row[key])])for row in rows]



    jsonfile = json.dump(new_rows)





    #save them into destination

    outfile =  "c:/Users/USER/.spyder-py3/machine- 

    learning/data2/testresult/user_" + str(i) + "_mod6.txt"

    thefile = open(outfile, 'w')



    json_output = jsonfile.strip("").split('},')



    for i in range(len(json_output)):

       json_output[i] = json_output[i] + '}'



    for item in json_output:    

       thefile.write("%sn" % item)

I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:

---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

<ipython-input-7-a2c25911084b> in <module>()

   2321             print('-----------------------test where am I--------------------------------')

   2322             for line in file_object:

-> 2323                 jrecord = json.loads(line)

   2324                 try:

   2325                     typ = jrecord['type']



~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)

    352             parse_int is None and parse_float is None and

    353             parse_constant is None and object_pairs_hook is None and not kw):

--> 354         return _default_decoder.decode(s)

    355     if cls is None:

    356         cls = JSONDecoder



~Anaconda3libjsondecoder.py in decode(self, s, _w)

    340         end = _w(s, end).end()

    341         if end != len(s):

--> 342             raise JSONDecodeError("Extra data", s, end)

    343         return obj

    344 



JSONDecodeError: Extra data: line 1 column 201 (char 200)

There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

add a comment |

up vote
2
down vote

favorite

I have a .txt file and I need to do outlier removal on. The file looks like this:

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}

{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}

................(there are many more lines like this in each file and I have several files)

(Note that there was no blank space between each two {} in the original text file.)

I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.

here is my code:

path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'

filelist = glob.glob(path, recursive = True)



for i in range(0,3):

    df = pd.read_json(filelist[i], lines=True)



    outlier_x = df['x'].mean() + df['x'].std() * 3

    outlier_x2 = df['x'].mean() - df['x'].std() * 3

    outlier_y = df['y'].mean() + df['y'].std() * 3

    outlier_y2 = df['y'].mean() - df['y'].std() * 3

    outlier_vx = df['vx'].mean() + df['vx'].std() * 3

    outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3

    outlier_vy = df['vy'].mean() + df['vy'].std() * 3

    outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3

    outlier_pr = df['press'].mean() + df['press'].std() * 3

    outlier_pr2 = df['press'].mean() - df['press'].std() * 3

    outlier_sz = df['size'].mean() + df['size'].std() * 3

    outlier_sz2 = df['size'].mean() - df['size'].std() * 3



    df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1, 

    inplace = True)

    df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',

          'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]





    # remove ouliers for column 'x'

    df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)

    # remove ouliers for column 'y'

    df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)

    # remove part of the infinite values from column 'vx'

    df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)

    df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)

    # replace infinit with NAN

    df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())

    # remove ouliers from column 'vx'  

    df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)

    # replace infinit with NAN

    df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())

    # fill na with '0' in columns 'vx'

    df['vx'] = df['vx'].fillna(0.0)

     # fill na with '0' in columns 'vy'

    df['vy'] = df['vy'].fillna(0.0)

    # remove outliers from column 'vy'

    df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)

    # remove outliers from column 'press'

    df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] == 

    'MOVE'))].index)

    # remove outliers from column 'size'

    df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)



    df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan



    col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',

                 'pcnt','pid','pidx','act','x','y','size','press','vx','vy']



     # modify dataframe to propriate json format

    jsonresult = df.to_json(orient='records')

     # read the json string to get a list of dictionaries

    rows = json.loads(jsonresult)







   # remove some null values

    new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in 

    row) and pd.notnull(row[key])])for row in rows]



    jsonfile = json.dump(new_rows)





    #save them into destination

    outfile =  "c:/Users/USER/.spyder-py3/machine- 

    learning/data2/testresult/user_" + str(i) + "_mod6.txt"

    thefile = open(outfile, 'w')



    json_output = jsonfile.strip("").split('},')



    for i in range(len(json_output)):

       json_output[i] = json_output[i] + '}'



    for item in json_output:    

       thefile.write("%sn" % item)

---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

<ipython-input-7-a2c25911084b> in <module>()

   2321             print('-----------------------test where am I--------------------------------')

   2322             for line in file_object:

-> 2323                 jrecord = json.loads(line)

   2324                 try:

   2325                     typ = jrecord['type']



~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)

    352             parse_int is None and parse_float is None and

    353             parse_constant is None and object_pairs_hook is None and not kw):

--> 354         return _default_decoder.decode(s)

    355     if cls is None:

    356         cls = JSONDecoder



~Anaconda3libjsondecoder.py in decode(self, s, _w)

    340         end = _w(s, end).end()

    341         if end != len(s):

--> 342             raise JSONDecodeError("Extra data", s, end)

    343         return obj

    344 



JSONDecodeError: Extra data: line 1 column 201 (char 200)

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

add a comment |

up vote
2
down vote

favorite

I have a .txt file and I need to do outlier removal on. The file looks like this:

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}

{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}

................(there are many more lines like this in each file and I have several files)

(Note that there was no blank space between each two {} in the original text file.)

I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.

here is my code:

path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'

filelist = glob.glob(path, recursive = True)



for i in range(0,3):

    df = pd.read_json(filelist[i], lines=True)



    outlier_x = df['x'].mean() + df['x'].std() * 3

    outlier_x2 = df['x'].mean() - df['x'].std() * 3

    outlier_y = df['y'].mean() + df['y'].std() * 3

    outlier_y2 = df['y'].mean() - df['y'].std() * 3

    outlier_vx = df['vx'].mean() + df['vx'].std() * 3

    outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3

    outlier_vy = df['vy'].mean() + df['vy'].std() * 3

    outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3

    outlier_pr = df['press'].mean() + df['press'].std() * 3

    outlier_pr2 = df['press'].mean() - df['press'].std() * 3

    outlier_sz = df['size'].mean() + df['size'].std() * 3

    outlier_sz2 = df['size'].mean() - df['size'].std() * 3



    df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1, 

    inplace = True)

    df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',

          'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]





    # remove ouliers for column 'x'

    df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)

    # remove ouliers for column 'y'

    df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)

    # remove part of the infinite values from column 'vx'

    df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)

    df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)

    # replace infinit with NAN

    df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())

    # remove ouliers from column 'vx'  

    df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)

    # replace infinit with NAN

    df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())

    # fill na with '0' in columns 'vx'

    df['vx'] = df['vx'].fillna(0.0)

     # fill na with '0' in columns 'vy'

    df['vy'] = df['vy'].fillna(0.0)

    # remove outliers from column 'vy'

    df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)

    # remove outliers from column 'press'

    df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] == 

    'MOVE'))].index)

    # remove outliers from column 'size'

    df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)



    df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan



    col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',

                 'pcnt','pid','pidx','act','x','y','size','press','vx','vy']



     # modify dataframe to propriate json format

    jsonresult = df.to_json(orient='records')

     # read the json string to get a list of dictionaries

    rows = json.loads(jsonresult)







   # remove some null values

    new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in 

    row) and pd.notnull(row[key])])for row in rows]



    jsonfile = json.dump(new_rows)





    #save them into destination

    outfile =  "c:/Users/USER/.spyder-py3/machine- 

    learning/data2/testresult/user_" + str(i) + "_mod6.txt"

    thefile = open(outfile, 'w')



    json_output = jsonfile.strip("").split('},')



    for i in range(len(json_output)):

       json_output[i] = json_output[i] + '}'



    for item in json_output:    

       thefile.write("%sn" % item)

---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

<ipython-input-7-a2c25911084b> in <module>()

   2321             print('-----------------------test where am I--------------------------------')

   2322             for line in file_object:

-> 2323                 jrecord = json.loads(line)

   2324                 try:

   2325                     typ = jrecord['type']



~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)

    352             parse_int is None and parse_float is None and

    353             parse_constant is None and object_pairs_hook is None and not kw):

--> 354         return _default_decoder.decode(s)

    355     if cls is None:

    356         cls = JSONDecoder



~Anaconda3libjsondecoder.py in decode(self, s, _w)

    340         end = _w(s, end).end()

    341         if end != len(s):

--> 342             raise JSONDecodeError("Extra data", s, end)

    343         return obj

    344 



JSONDecodeError: Extra data: line 1 column 201 (char 200)

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

I have a .txt file and I need to do outlier removal on. The file looks like this:

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}

{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}

{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}

{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}

{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}

{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}

................(there are many more lines like this in each file and I have several files)

(Note that there was no blank space between each two {} in the original text file.)

I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.

here is my code:

path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'

filelist = glob.glob(path, recursive = True)



for i in range(0,3):

    df = pd.read_json(filelist[i], lines=True)



    outlier_x = df['x'].mean() + df['x'].std() * 3

    outlier_x2 = df['x'].mean() - df['x'].std() * 3

    outlier_y = df['y'].mean() + df['y'].std() * 3

    outlier_y2 = df['y'].mean() - df['y'].std() * 3

    outlier_vx = df['vx'].mean() + df['vx'].std() * 3

    outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3

    outlier_vy = df['vy'].mean() + df['vy'].std() * 3

    outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3

    outlier_pr = df['press'].mean() + df['press'].std() * 3

    outlier_pr2 = df['press'].mean() - df['press'].std() * 3

    outlier_sz = df['size'].mean() + df['size'].std() * 3

    outlier_sz2 = df['size'].mean() - df['size'].std() * 3



    df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1, 

    inplace = True)

    df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',

          'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]





    # remove ouliers for column 'x'

    df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)

    # remove ouliers for column 'y'

    df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)

    # remove part of the infinite values from column 'vx'

    df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)

    df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)

    # replace infinit with NAN

    df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())

    # remove ouliers from column 'vx'  

    df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)

    # replace infinit with NAN

    df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())

    # fill na with '0' in columns 'vx'

    df['vx'] = df['vx'].fillna(0.0)

     # fill na with '0' in columns 'vy'

    df['vy'] = df['vy'].fillna(0.0)

    # remove outliers from column 'vy'

    df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)

    # remove outliers from column 'press'

    df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] == 

    'MOVE'))].index)

    # remove outliers from column 'size'

    df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)

    df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)



    df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan



    col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',

                 'pcnt','pid','pidx','act','x','y','size','press','vx','vy']



     # modify dataframe to propriate json format

    jsonresult = df.to_json(orient='records')

     # read the json string to get a list of dictionaries

    rows = json.loads(jsonresult)







   # remove some null values

    new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in 

    row) and pd.notnull(row[key])])for row in rows]



    jsonfile = json.dump(new_rows)





    #save them into destination

    outfile =  "c:/Users/USER/.spyder-py3/machine- 

    learning/data2/testresult/user_" + str(i) + "_mod6.txt"

    thefile = open(outfile, 'w')



    json_output = jsonfile.strip("").split('},')



    for i in range(len(json_output)):

       json_output[i] = json_output[i] + '}'



    for item in json_output:    

       thefile.write("%sn" % item)

---------------------------------------------------------------------------

JSONDecodeError                           Traceback (most recent call last)

<ipython-input-7-a2c25911084b> in <module>()

   2321             print('-----------------------test where am I--------------------------------')

   2322             for line in file_object:

-> 2323                 jrecord = json.loads(line)

   2324                 try:

   2325                     typ = jrecord['type']



~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)

    352             parse_int is None and parse_float is None and

    353             parse_constant is None and object_pairs_hook is None and not kw):

--> 354         return _default_decoder.decode(s)

    355     if cls is None:

    356         cls = JSONDecoder



~Anaconda3libjsondecoder.py in decode(self, s, _w)

    340         end = _w(s, end).end()

    341         if end != len(s):

--> 342             raise JSONDecodeError("Extra data", s, end)

    343         return obj

    344 



JSONDecodeError: Extra data: line 1 column 201 (char 200)

python json readfile

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

edited Nov 13 at 18:07

Vasilis G.

2,8952721

edited Nov 13 at 18:07

Vasilis G.

2,8952721

edited Nov 13 at 18:07

Vasilis G.

2,8952721

asked Nov 13 at 17:56

Leran

166

asked Nov 13 at 17:56

Leran

166

asked Nov 13 at 17:56

Leran

166

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)

answered Nov 13 at 18:17

Kirk

547414

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286920%2fpython-cannot-get-original-format-after-read-txt-file-into-dataframe-and-write%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)

answered Nov 13 at 18:17

Kirk

547414

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

add a comment |

up vote
1
down vote

accepted

You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)

answered Nov 13 at 18:17

Kirk

547414

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

add a comment |

up vote
1
down vote

accepted

You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)

answered Nov 13 at 18:17

Kirk

547414

You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)

answered Nov 13 at 18:17

Kirk

547414

answered Nov 13 at 18:17

Kirk

547414

answered Nov 13 at 18:17

Kirk

547414

answered Nov 13 at 18:17

Kirk

547414

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

add a comment |

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

Thanks, it works! I thought too much of it so that made this problem complicated.
– Leran
Nov 13 at 20:03

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky