Python: cannot get original format after read .txt file into dataframe and write back to file











up vote
2
down vote

favorite












I have a .txt file and I need to do outlier removal on. The file looks like this:



{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
{"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
{"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
{"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
{"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
{"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}


................(there are many more lines like this in each file and I have several files)



(Note that there was no blank space between each two {} in the original text file.)



I read it into dataframes with the read_txt() function and finished outlier removal.
Now I need to read it back to text file with excatly the same format as before.



here is my code:



path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
filelist = glob.glob(path, recursive = True)

for i in range(0,3):
df = pd.read_json(filelist[i], lines=True)

outlier_x = df['x'].mean() + df['x'].std() * 3
outlier_x2 = df['x'].mean() - df['x'].std() * 3
outlier_y = df['y'].mean() + df['y'].std() * 3
outlier_y2 = df['y'].mean() - df['y'].std() * 3
outlier_vx = df['vx'].mean() + df['vx'].std() * 3
outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
outlier_vy = df['vy'].mean() + df['vy'].std() * 3
outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
outlier_pr = df['press'].mean() + df['press'].std() * 3
outlier_pr2 = df['press'].mean() - df['press'].std() * 3
outlier_sz = df['size'].mean() + df['size'].std() * 3
outlier_sz2 = df['size'].mean() - df['size'].std() * 3

df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
inplace = True)
df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]


# remove ouliers for column 'x'
df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
# remove ouliers for column 'y'
df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
# remove part of the infinite values from column 'vx'
df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
# replace infinit with NAN
df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
# remove ouliers from column 'vx'
df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
# replace infinit with NAN
df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
# fill na with '0' in columns 'vx'
df['vx'] = df['vx'].fillna(0.0)
# fill na with '0' in columns 'vy'
df['vy'] = df['vy'].fillna(0.0)
# remove outliers from column 'vy'
df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
# remove outliers from column 'press'
df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
'MOVE'))].index)
# remove outliers from column 'size'
df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)

df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan

col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
'pcnt','pid','pidx','act','x','y','size','press','vx','vy']

# modify dataframe to propriate json format
jsonresult = df.to_json(orient='records')
# read the json string to get a list of dictionaries
rows = json.loads(jsonresult)



# remove some null values
new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
row) and pd.notnull(row[key])])for row in rows]

jsonfile = json.dump(new_rows)


#save them into destination
outfile = "c:/Users/USER/.spyder-py3/machine-
learning/data2/testresult/user_" + str(i) + "_mod6.txt"
thefile = open(outfile, 'w')

json_output = jsonfile.strip("").split('},')

for i in range(len(json_output)):
json_output[i] = json_output[i] + '}'

for item in json_output:
thefile.write("%sn" % item)


I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:



---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-7-a2c25911084b> in <module>()
2321 print('-----------------------test where am I--------------------------------')
2322 for line in file_object:
-> 2323 jrecord = json.loads(line)
2324 try:
2325 typ = jrecord['type']

~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
352 parse_int is None and parse_float is None and
353 parse_constant is None and object_pairs_hook is None and not kw):
--> 354 return _default_decoder.decode(s)
355 if cls is None:
356 cls = JSONDecoder

~Anaconda3libjsondecoder.py in decode(self, s, _w)
340 end = _w(s, end).end()
341 if end != len(s):
--> 342 raise JSONDecodeError("Extra data", s, end)
343 return obj
344

JSONDecodeError: Extra data: line 1 column 201 (char 200)


There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!










share|improve this question




























    up vote
    2
    down vote

    favorite












    I have a .txt file and I need to do outlier removal on. The file looks like this:



    {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
    {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
    {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
    {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
    {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
    {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
    {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
    {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
    {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
    {"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}


    ................(there are many more lines like this in each file and I have several files)



    (Note that there was no blank space between each two {} in the original text file.)



    I read it into dataframes with the read_txt() function and finished outlier removal.
    Now I need to read it back to text file with excatly the same format as before.



    here is my code:



    path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
    filelist = glob.glob(path, recursive = True)

    for i in range(0,3):
    df = pd.read_json(filelist[i], lines=True)

    outlier_x = df['x'].mean() + df['x'].std() * 3
    outlier_x2 = df['x'].mean() - df['x'].std() * 3
    outlier_y = df['y'].mean() + df['y'].std() * 3
    outlier_y2 = df['y'].mean() - df['y'].std() * 3
    outlier_vx = df['vx'].mean() + df['vx'].std() * 3
    outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
    outlier_vy = df['vy'].mean() + df['vy'].std() * 3
    outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
    outlier_pr = df['press'].mean() + df['press'].std() * 3
    outlier_pr2 = df['press'].mean() - df['press'].std() * 3
    outlier_sz = df['size'].mean() + df['size'].std() * 3
    outlier_sz2 = df['size'].mean() - df['size'].std() * 3

    df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
    inplace = True)
    df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
    'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]


    # remove ouliers for column 'x'
    df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
    # remove ouliers for column 'y'
    df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
    # remove part of the infinite values from column 'vx'
    df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
    df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
    # replace infinit with NAN
    df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
    # remove ouliers from column 'vx'
    df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
    # replace infinit with NAN
    df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
    # fill na with '0' in columns 'vx'
    df['vx'] = df['vx'].fillna(0.0)
    # fill na with '0' in columns 'vy'
    df['vy'] = df['vy'].fillna(0.0)
    # remove outliers from column 'vy'
    df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
    # remove outliers from column 'press'
    df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
    'MOVE'))].index)
    # remove outliers from column 'size'
    df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
    df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)

    df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan

    col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
    'pcnt','pid','pidx','act','x','y','size','press','vx','vy']

    # modify dataframe to propriate json format
    jsonresult = df.to_json(orient='records')
    # read the json string to get a list of dictionaries
    rows = json.loads(jsonresult)



    # remove some null values
    new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
    row) and pd.notnull(row[key])])for row in rows]

    jsonfile = json.dump(new_rows)


    #save them into destination
    outfile = "c:/Users/USER/.spyder-py3/machine-
    learning/data2/testresult/user_" + str(i) + "_mod6.txt"
    thefile = open(outfile, 'w')

    json_output = jsonfile.strip("").split('},')

    for i in range(len(json_output)):
    json_output[i] = json_output[i] + '}'

    for item in json_output:
    thefile.write("%sn" % item)


    I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:



    ---------------------------------------------------------------------------
    JSONDecodeError Traceback (most recent call last)
    <ipython-input-7-a2c25911084b> in <module>()
    2321 print('-----------------------test where am I--------------------------------')
    2322 for line in file_object:
    -> 2323 jrecord = json.loads(line)
    2324 try:
    2325 typ = jrecord['type']

    ~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    352 parse_int is None and parse_float is None and
    353 parse_constant is None and object_pairs_hook is None and not kw):
    --> 354 return _default_decoder.decode(s)
    355 if cls is None:
    356 cls = JSONDecoder

    ~Anaconda3libjsondecoder.py in decode(self, s, _w)
    340 end = _w(s, end).end()
    341 if end != len(s):
    --> 342 raise JSONDecodeError("Extra data", s, end)
    343 return obj
    344

    JSONDecodeError: Extra data: line 1 column 201 (char 200)


    There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!










    share|improve this question


























      up vote
      2
      down vote

      favorite









      up vote
      2
      down vote

      favorite











      I have a .txt file and I need to do outlier removal on. The file looks like this:



      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
      {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
      {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
      {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
      {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
      {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
      {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
      {"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}


      ................(there are many more lines like this in each file and I have several files)



      (Note that there was no blank space between each two {} in the original text file.)



      I read it into dataframes with the read_txt() function and finished outlier removal.
      Now I need to read it back to text file with excatly the same format as before.



      here is my code:



      path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
      filelist = glob.glob(path, recursive = True)

      for i in range(0,3):
      df = pd.read_json(filelist[i], lines=True)

      outlier_x = df['x'].mean() + df['x'].std() * 3
      outlier_x2 = df['x'].mean() - df['x'].std() * 3
      outlier_y = df['y'].mean() + df['y'].std() * 3
      outlier_y2 = df['y'].mean() - df['y'].std() * 3
      outlier_vx = df['vx'].mean() + df['vx'].std() * 3
      outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
      outlier_vy = df['vy'].mean() + df['vy'].std() * 3
      outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
      outlier_pr = df['press'].mean() + df['press'].std() * 3
      outlier_pr2 = df['press'].mean() - df['press'].std() * 3
      outlier_sz = df['size'].mean() + df['size'].std() * 3
      outlier_sz2 = df['size'].mean() - df['size'].std() * 3

      df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
      inplace = True)
      df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
      'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]


      # remove ouliers for column 'x'
      df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
      # remove ouliers for column 'y'
      df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
      # remove part of the infinite values from column 'vx'
      df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
      df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
      # replace infinit with NAN
      df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
      # remove ouliers from column 'vx'
      df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
      # replace infinit with NAN
      df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
      # fill na with '0' in columns 'vx'
      df['vx'] = df['vx'].fillna(0.0)
      # fill na with '0' in columns 'vy'
      df['vy'] = df['vy'].fillna(0.0)
      # remove outliers from column 'vy'
      df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
      # remove outliers from column 'press'
      df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
      'MOVE'))].index)
      # remove outliers from column 'size'
      df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)

      df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan

      col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
      'pcnt','pid','pidx','act','x','y','size','press','vx','vy']

      # modify dataframe to propriate json format
      jsonresult = df.to_json(orient='records')
      # read the json string to get a list of dictionaries
      rows = json.loads(jsonresult)



      # remove some null values
      new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
      row) and pd.notnull(row[key])])for row in rows]

      jsonfile = json.dump(new_rows)


      #save them into destination
      outfile = "c:/Users/USER/.spyder-py3/machine-
      learning/data2/testresult/user_" + str(i) + "_mod6.txt"
      thefile = open(outfile, 'w')

      json_output = jsonfile.strip("").split('},')

      for i in range(len(json_output)):
      json_output[i] = json_output[i] + '}'

      for item in json_output:
      thefile.write("%sn" % item)


      I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:



      ---------------------------------------------------------------------------
      JSONDecodeError Traceback (most recent call last)
      <ipython-input-7-a2c25911084b> in <module>()
      2321 print('-----------------------test where am I--------------------------------')
      2322 for line in file_object:
      -> 2323 jrecord = json.loads(line)
      2324 try:
      2325 typ = jrecord['type']

      ~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
      352 parse_int is None and parse_float is None and
      353 parse_constant is None and object_pairs_hook is None and not kw):
      --> 354 return _default_decoder.decode(s)
      355 if cls is None:
      356 cls = JSONDecoder

      ~Anaconda3libjsondecoder.py in decode(self, s, _w)
      340 end = _w(s, end).end()
      341 if end != len(s):
      --> 342 raise JSONDecodeError("Extra data", s, end)
      343 return obj
      344

      JSONDecodeError: Extra data: line 1 column 201 (char 200)


      There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!










      share|improve this question















      I have a .txt file and I need to do outlier removal on. The file looks like this:



      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"1", "pid":"0", "pidx":"0", "act":"DOWN-1ST", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"0.0007", "vy":"0.0013"}
      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"0", "pidx":"0", "act":"DOWN-P2", "x":"557.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"NaN", "vy":"NaN"}
      {"mille":"802821", "type":"tc", "test":"mod6", "hrow":"C", "pcnt":"2", "pid":"1", "pidx":"1", "act":"DOWN-P2", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"Infinity", "vy":"-Infinity"}
      {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"556.00", "y":"1043.00", "size":"0.3333", "press":"0.6000", "vx":"-5.3125", "vy":"18.0625"}
      {"mille":"802837", "type":"th", "test":"mod6", "hrow":"0", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"5.3125", "vy":"-18.0625"}
      {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"555.00", "y":"1044.00", "size":"0.3333", "press":"0.6000", "vx":"-3.4400", "vy":"11.6000"}
      {"mille":"802846", "type":"th", "test":"mod6", "hrow":"1", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"3.4400", "vy":"-11.6000"}
      {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"554.00", "y":"1045.00", "size":"0.3333", "press":"0.6000", "vx":"-2.6364", "vy":"8.8182"}
      {"mille":"802854", "type":"th", "test":"mod6", "hrow":"2", "pcnt":"2", "pid":"1", "pidx":"1", "act":"MOVE", "x":"641.00", "y":"754.00", "size":"0.2000", "press":"0.5500", "vx":"2.6364", "vy":"-8.8182"}
      {"mille":"802863", "type":"th", "test":"mod6", "hrow":"3", "pcnt":"2", "pid":"0", "pidx":"0", "act":"MOVE", "x":"553.00", "y":"1047.00", "size":"0.3333", "press":"0.6125", "vx":"-2.0952", "vy":"6.9762"}


      ................(there are many more lines like this in each file and I have several files)



      (Note that there was no blank space between each two {} in the original text file.)



      I read it into dataframes with the read_txt() function and finished outlier removal.
      Now I need to read it back to text file with excatly the same format as before.



      here is my code:



      path = 'c:/Users/USER/.spyder-py3/machine-learning/data2/test/*.txt'
      filelist = glob.glob(path, recursive = True)

      for i in range(0,3):
      df = pd.read_json(filelist[i], lines=True)

      outlier_x = df['x'].mean() + df['x'].std() * 3
      outlier_x2 = df['x'].mean() - df['x'].std() * 3
      outlier_y = df['y'].mean() + df['y'].std() * 3
      outlier_y2 = df['y'].mean() - df['y'].std() * 3
      outlier_vx = df['vx'].mean() + df['vx'].std() * 3
      outlier_vx2 = df['vx'].mean() - df['vx'].std() * 3
      outlier_vy = df['vy'].mean() + df['vy'].std() * 3
      outlier_vy2 = df['vy'].mean() - df['vy'].std() * 3
      outlier_pr = df['press'].mean() + df['press'].std() * 3
      outlier_pr2 = df['press'].mean() - df['press'].std() * 3
      outlier_sz = df['size'].mean() + df['size'].std() * 3
      outlier_sz2 = df['size'].mean() - df['size'].std() * 3

      df.drop(['act1','act2','size1','size2','x1','x2','y1','y2'],axis = 1,
      inplace = True)
      df = df[['mille','type','test','xfocus','yfocus','span','sfact','hrow',
      'pcnt','pid','pidx','act','x','y','size','press','vx','vy']]


      # remove ouliers for column 'x'
      df = df.drop(df[((df['x'] > outlier_x) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['x'] < outlier_x2) & (df['act'] == 'MOVE'))].index)
      # remove ouliers for column 'y'
      df = df.drop(df[((df['y'] > outlier_y) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['y'] < outlier_y2) & (df['act'] == 'MOVE'))].index)
      # remove part of the infinite values from column 'vx'
      df = df.drop(df[(((df['vx'] == np.inf) & (df['act'] == 'MOVE')))].index)
      df = df.drop(df[(((df['vx'] == -np.inf) & (df['act'] == 'MOVE')))].index)
      # replace infinit with NAN
      df['vx'] = df['vx'].replace([np.inf,-np.inf],df['vx'].mean())
      # remove ouliers from column 'vx'
      df = df.drop(df[((df['vx'] > outlier_vx) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['vx'] < outlier_vx2) & (df['act'] == 'MOVE'))].index)
      # replace infinit with NAN
      df['vy'] = df['vy'].replace([np.inf,-np.inf],df['vy'].mean())
      # fill na with '0' in columns 'vx'
      df['vx'] = df['vx'].fillna(0.0)
      # fill na with '0' in columns 'vy'
      df['vy'] = df['vy'].fillna(0.0)
      # remove outliers from column 'vy'
      df = df.drop(df[((df['vy'] > outlier_vy) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['vy'] < outlier_vy2) & (df['act'] == 'MOVE'))].index)
      # remove outliers from column 'press'
      df = df.drop(df[((df['press'] > outlier_pr) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['press'] < outlier_pr2) & (df['act'] ==
      'MOVE'))].index)
      # remove outliers from column 'size'
      df = df.drop(df[((df['size'] > outlier_sz) & (df['act'] == 'MOVE'))].index)
      df = df.drop(df[((df['size'] < outlier_sz2) & (df['act'] == 'MOVE'))].index)

      df.loc[df.xfocus.notnull(), ['vx','vy']] = np.nan,np.nan

      col_select = ['mille','type','test','xfocus','yfocus','span','sfact','hrow',
      'pcnt','pid','pidx','act','x','y','size','press','vx','vy']

      # modify dataframe to propriate json format
      jsonresult = df.to_json(orient='records')
      # read the json string to get a list of dictionaries
      rows = json.loads(jsonresult)



      # remove some null values
      new_rows = [OrderedDict([(key, row[key]) for key in col_select if (key in
      row) and pd.notnull(row[key])])for row in rows]

      jsonfile = json.dump(new_rows)


      #save them into destination
      outfile = "c:/Users/USER/.spyder-py3/machine-
      learning/data2/testresult/user_" + str(i) + "_mod6.txt"
      thefile = open(outfile, 'w')

      json_output = jsonfile.strip("").split('},')

      for i in range(len(json_output)):
      json_output[i] = json_output[i] + '}'

      for item in json_output:
      thefile.write("%sn" % item)


      I tried to get a txt file just like the original one, and the outpur does look similar. But when I tried to read the cleaned txt file and do other operation on it, I got an error like this: JSONDecodeError: Extra data: line 1 column 201 (char 200). The entire error message is as below:



      ---------------------------------------------------------------------------
      JSONDecodeError Traceback (most recent call last)
      <ipython-input-7-a2c25911084b> in <module>()
      2321 print('-----------------------test where am I--------------------------------')
      2322 for line in file_object:
      -> 2323 jrecord = json.loads(line)
      2324 try:
      2325 typ = jrecord['type']

      ~Anaconda3libjson__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
      352 parse_int is None and parse_float is None and
      353 parse_constant is None and object_pairs_hook is None and not kw):
      --> 354 return _default_decoder.decode(s)
      355 if cls is None:
      356 cls = JSONDecoder

      ~Anaconda3libjsondecoder.py in decode(self, s, _w)
      340 end = _w(s, end).end()
      341 if end != len(s):
      --> 342 raise JSONDecodeError("Extra data", s, end)
      343 return obj
      344

      JSONDecodeError: Extra data: line 1 column 201 (char 200)


      There was no such error when I deal with the.txt file that is not cleaned. So obviousely there is something wrong when I write the data back. Now I stuck here and not knowing what I can do to move on. Can anybody help me out? Thanks in advance!







      python json readfile






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 13 at 18:07









      Vasilis G.

      2,8952721




      2,8952721










      asked Nov 13 at 17:56









      Leran

      166




      166
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)






          share|improve this answer





















          • Thanks, it works! I thought too much of it so that made this problem complicated.
            – Leran
            Nov 13 at 20:03











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286920%2fpython-cannot-get-original-format-after-read-txt-file-into-dataframe-and-write%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)






          share|improve this answer





















          • Thanks, it works! I thought too much of it so that made this problem complicated.
            – Leran
            Nov 13 at 20:03















          up vote
          1
          down vote



          accepted










          You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)






          share|improve this answer





















          • Thanks, it works! I thought too much of it so that made this problem complicated.
            – Leran
            Nov 13 at 20:03













          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)






          share|improve this answer












          You should be able to write your cleaned dataframe out with df.to_json(outfile, orient='records', lines=True)







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 13 at 18:17









          Kirk

          547414




          547414












          • Thanks, it works! I thought too much of it so that made this problem complicated.
            – Leran
            Nov 13 at 20:03


















          • Thanks, it works! I thought too much of it so that made this problem complicated.
            – Leran
            Nov 13 at 20:03
















          Thanks, it works! I thought too much of it so that made this problem complicated.
          – Leran
          Nov 13 at 20:03




          Thanks, it works! I thought too much of it so that made this problem complicated.
          – Leran
          Nov 13 at 20:03


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.





          Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


          Please pay close attention to the following guidance:


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53286920%2fpython-cannot-get-original-format-after-read-txt-file-into-dataframe-and-write%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          How to change which sound is reproduced for terminal bell?

          Can I use Tabulator js library in my java Spring + Thymeleaf project?

          Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents