How to ignore dtype when read file csv using Dask Dataframe












1















I have a large file csv, it has 9600 columns and each column has a different type. When I read file using Dask Datafame and use attribute head(), I get error Mismatched dtypes found in pd.read_csv/pd.read_table. How can I ignore it. I use pandas read file csv don't have errors but very slow because the size of file is 2.5GB.
Thank!










share|improve this question




















  • 1





    That's a lot of columns! How many rows?

    – mdurant
    Nov 21 '18 at 14:50











  • Could you provide a sample of your csv file? along with code to import the file?

    – leoburgy
    Nov 21 '18 at 14:58






  • 1





    depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

    – leoburgy
    Nov 21 '18 at 15:03











  • In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

    – leoburgy
    Nov 21 '18 at 15:06








  • 1





    ^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

    – mdurant
    Nov 21 '18 at 15:47
















1















I have a large file csv, it has 9600 columns and each column has a different type. When I read file using Dask Datafame and use attribute head(), I get error Mismatched dtypes found in pd.read_csv/pd.read_table. How can I ignore it. I use pandas read file csv don't have errors but very slow because the size of file is 2.5GB.
Thank!










share|improve this question




















  • 1





    That's a lot of columns! How many rows?

    – mdurant
    Nov 21 '18 at 14:50











  • Could you provide a sample of your csv file? along with code to import the file?

    – leoburgy
    Nov 21 '18 at 14:58






  • 1





    depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

    – leoburgy
    Nov 21 '18 at 15:03











  • In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

    – leoburgy
    Nov 21 '18 at 15:06








  • 1





    ^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

    – mdurant
    Nov 21 '18 at 15:47














1












1








1








I have a large file csv, it has 9600 columns and each column has a different type. When I read file using Dask Datafame and use attribute head(), I get error Mismatched dtypes found in pd.read_csv/pd.read_table. How can I ignore it. I use pandas read file csv don't have errors but very slow because the size of file is 2.5GB.
Thank!










share|improve this question
















I have a large file csv, it has 9600 columns and each column has a different type. When I read file using Dask Datafame and use attribute head(), I get error Mismatched dtypes found in pd.read_csv/pd.read_table. How can I ignore it. I use pandas read file csv don't have errors but very slow because the size of file is 2.5GB.
Thank!







python pandas dask






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 21 '18 at 15:09









leoburgy

1107




1107










asked Nov 21 '18 at 14:47









Hoàng Quốc CườngHoàng Quốc Cường

62




62








  • 1





    That's a lot of columns! How many rows?

    – mdurant
    Nov 21 '18 at 14:50











  • Could you provide a sample of your csv file? along with code to import the file?

    – leoburgy
    Nov 21 '18 at 14:58






  • 1





    depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

    – leoburgy
    Nov 21 '18 at 15:03











  • In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

    – leoburgy
    Nov 21 '18 at 15:06








  • 1





    ^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

    – mdurant
    Nov 21 '18 at 15:47














  • 1





    That's a lot of columns! How many rows?

    – mdurant
    Nov 21 '18 at 14:50











  • Could you provide a sample of your csv file? along with code to import the file?

    – leoburgy
    Nov 21 '18 at 14:58






  • 1





    depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

    – leoburgy
    Nov 21 '18 at 15:03











  • In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

    – leoburgy
    Nov 21 '18 at 15:06








  • 1





    ^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

    – mdurant
    Nov 21 '18 at 15:47








1




1





That's a lot of columns! How many rows?

– mdurant
Nov 21 '18 at 14:50





That's a lot of columns! How many rows?

– mdurant
Nov 21 '18 at 14:50













Could you provide a sample of your csv file? along with code to import the file?

– leoburgy
Nov 21 '18 at 14:58





Could you provide a sample of your csv file? along with code to import the file?

– leoburgy
Nov 21 '18 at 14:58




1




1





depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

– leoburgy
Nov 21 '18 at 15:03





depending on your degree of knowledge about the potential data type present in the data set, you may want to cast the data type to string by setting the dtype argument in the read_csv() function.

– leoburgy
Nov 21 '18 at 15:03













In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

– leoburgy
Nov 21 '18 at 15:06







In the foot note of Dask doc (), you can read that despite the inference about the data type, the presence of NaN can confuse the csv reader function. docs.dask.org/en/latest/…

– leoburgy
Nov 21 '18 at 15:06






1




1





^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

– mdurant
Nov 21 '18 at 15:47





^ this doesn't tell us much. You said that the types were string or null, so explicitly loading as str sounds ok, but you have more information than we do.

– mdurant
Nov 21 '18 at 15:47












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414613%2fhow-to-ignore-dtype-when-read-file-csv-using-dask-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414613%2fhow-to-ignore-dtype-when-read-file-csv-using-dask-dataframe%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to change which sound is reproduced for terminal bell?

Can I use Tabulator js library in my java Spring + Thymeleaf project?

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents