Hive: How to deal with files that comprise unfixed number of fields?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:

uid1, eid01, para1, para2, para3,para4,para5,timestamp

uid1, eid12, para56, para57, timestamp

uid3, eid42, para102,timestamp

The number of fields is not fixed.

Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:

  uid    eid              para                  datatime

  uid1  eid01  para1, para2, para3,para4,para5  timestamp

  uid1  eid12  para56, para57                   timestamp

  uid3  eid42  para102                          timestamp

The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?

Any help is appreciated.

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

add a comment |

Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:

uid1, eid01, para1, para2, para3,para4,para5,timestamp

uid1, eid12, para56, para57, timestamp

uid3, eid42, para102,timestamp

The number of fields is not fixed.

Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:

  uid    eid              para                  datatime

  uid1  eid01  para1, para2, para3,para4,para5  timestamp

  uid1  eid12  para56, para57                   timestamp

  uid3  eid42  para102                          timestamp

The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?

Any help is appreciated.

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

add a comment |

Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:

uid1, eid01, para1, para2, para3,para4,para5,timestamp

uid1, eid12, para56, para57, timestamp

uid3, eid42, para102,timestamp

The number of fields is not fixed.

Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:

  uid    eid              para                  datatime

  uid1  eid01  para1, para2, para3,para4,para5  timestamp

  uid1  eid12  para56, para57                   timestamp

  uid3  eid42  para102                          timestamp

The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?

Any help is appreciated.

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

Dealing with a file on HDFS that comprises different num of fields separated by ','. For instance:

uid1, eid01, para1, para2, para3,para4,para5,timestamp

uid1, eid12, para56, para57, timestamp

uid3, eid42, para102,timestamp

The number of fields is not fixed.

Now I want to put these data into a Hive table that has 4 columns, and all fields of 'para..' in one colume like:

  uid    eid              para                  datatime

  uid1  eid01  para1, para2, para3,para4,para5  timestamp

  uid1  eid12  para56, para57                   timestamp

  uid3  eid42  para102                          timestamp

The data amount is so large that I cannot deal with it using tools like AWK. Is there any other solution?

Any help is appreciated.

hadoop hive

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

edited Nov 22 '18 at 8:44

asked Nov 22 '18 at 8:38

user2894829

1199

asked Nov 22 '18 at 8:38

user2894829

1199

asked Nov 22 '18 at 8:38

user2894829

1199

add a comment |

1 Answer
1

active

oldest

votes

create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.

create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp

when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)

answered Nov 22 '18 at 8:51

Tom

1,5381336

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53426830%2fhive-how-to-deal-with-files-that-comprise-unfixed-number-of-fields%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.

create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp

when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)

answered Nov 22 '18 at 8:51

Tom

1,5381336

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

add a comment |

create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.

create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp

when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)

answered Nov 22 '18 at 8:51

Tom

1,5381336

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

add a comment |

create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.

create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp

when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)

answered Nov 22 '18 at 8:51

Tom

1,5381336

create a temp hive table such as t_data_tmp(line string) , it has only one column. Load the data in hdfs file to t_data_tmp, per line will be one row.

create a hive table t_data with your schema, and insert overwrite t_data as select from t_data_tmp

when select form t_data_tmp, using hive functions for string(position, substr) to figure out value for each column(using the second comma and last comma as the splitter)

answered Nov 22 '18 at 8:51

Tom

1,5381336

answered Nov 22 '18 at 8:51

Tom

1,5381336

answered Nov 22 '18 at 8:51

Tom

1,5381336

answered Nov 22 '18 at 8:51

Tom

1,5381336

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

add a comment |

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

well thanks for your reply. I think split() and reverse() can achieve my target.

– user2894829
Nov 22 '18 at 9:17

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky