Density plot looks normal, qqplot not normal Shapiro significative





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







2












$begingroup$


The following QQ plot looks with too many points out of the line, the density plot looks normal and the Shapiro Test p-value < 2.2e-16, so this is not a normal distribution but I've read not to trust Shapiro Test when I have about 1000 data points so I should conclude that this distribution is normal?enter image description here



enter image description here










share|cite|improve this question









$endgroup$












  • $begingroup$
    Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
    $endgroup$
    – Peter Leopold
    Feb 17 at 16:07






  • 1




    $begingroup$
    @Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
    $endgroup$
    – whuber
    Feb 17 at 16:12






  • 1




    $begingroup$
    @PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
    $endgroup$
    – Nick Cox
    Feb 17 at 16:18












  • $begingroup$
    @Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
    $endgroup$
    – Peter Flom
    Feb 17 at 16:18










  • $begingroup$
    I would not trust a scale for happiness [NB] with such results!
    $endgroup$
    – Nick Cox
    Feb 17 at 16:20


















2












$begingroup$


The following QQ plot looks with too many points out of the line, the density plot looks normal and the Shapiro Test p-value < 2.2e-16, so this is not a normal distribution but I've read not to trust Shapiro Test when I have about 1000 data points so I should conclude that this distribution is normal?enter image description here



enter image description here










share|cite|improve this question









$endgroup$












  • $begingroup$
    Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
    $endgroup$
    – Peter Leopold
    Feb 17 at 16:07






  • 1




    $begingroup$
    @Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
    $endgroup$
    – whuber
    Feb 17 at 16:12






  • 1




    $begingroup$
    @PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
    $endgroup$
    – Nick Cox
    Feb 17 at 16:18












  • $begingroup$
    @Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
    $endgroup$
    – Peter Flom
    Feb 17 at 16:18










  • $begingroup$
    I would not trust a scale for happiness [NB] with such results!
    $endgroup$
    – Nick Cox
    Feb 17 at 16:20














2












2








2





$begingroup$


The following QQ plot looks with too many points out of the line, the density plot looks normal and the Shapiro Test p-value < 2.2e-16, so this is not a normal distribution but I've read not to trust Shapiro Test when I have about 1000 data points so I should conclude that this distribution is normal?enter image description here



enter image description here










share|cite|improve this question









$endgroup$




The following QQ plot looks with too many points out of the line, the density plot looks normal and the Shapiro Test p-value < 2.2e-16, so this is not a normal distribution but I've read not to trust Shapiro Test when I have about 1000 data points so I should conclude that this distribution is normal?enter image description here



enter image description here







normal-distribution






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Feb 17 at 15:24









AnaHochmaAnaHochma

111




111












  • $begingroup$
    Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
    $endgroup$
    – Peter Leopold
    Feb 17 at 16:07






  • 1




    $begingroup$
    @Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
    $endgroup$
    – whuber
    Feb 17 at 16:12






  • 1




    $begingroup$
    @PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
    $endgroup$
    – Nick Cox
    Feb 17 at 16:18












  • $begingroup$
    @Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
    $endgroup$
    – Peter Flom
    Feb 17 at 16:18










  • $begingroup$
    I would not trust a scale for happiness [NB] with such results!
    $endgroup$
    – Nick Cox
    Feb 17 at 16:20


















  • $begingroup$
    Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
    $endgroup$
    – Peter Leopold
    Feb 17 at 16:07






  • 1




    $begingroup$
    @Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
    $endgroup$
    – whuber
    Feb 17 at 16:12






  • 1




    $begingroup$
    @PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
    $endgroup$
    – Nick Cox
    Feb 17 at 16:18












  • $begingroup$
    @Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
    $endgroup$
    – Peter Flom
    Feb 17 at 16:18










  • $begingroup$
    I would not trust a scale for happiness [NB] with such results!
    $endgroup$
    – Nick Cox
    Feb 17 at 16:20
















$begingroup$
Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
$endgroup$
– Peter Leopold
Feb 17 at 16:07




$begingroup$
Clear asymmetry suggests something is up. A) Did you calculate the skewness? A nominal rule-of-thumb is if Pearson Skewness $ge$ 0.1, then you have to take corrective action, e.g., performing statistics on the log() of your measurements, rather thanon the measurements direclty. Also, are there additional factors (meta data, etc.) that you can use to subselect your data? This may also be a mixture of models, so you might have two or 3 normal distributions all sitting close together, but the second and third are small enough so as not to create an obviously multimodal histogram.
$endgroup$
– Peter Leopold
Feb 17 at 16:07




1




1




$begingroup$
@Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
$endgroup$
– whuber
Feb 17 at 16:12




$begingroup$
@Peter Where does that rule of thumb come from? It's not generally applicable, so it would be of interest to know its limitations and assumptions.
$endgroup$
– whuber
Feb 17 at 16:12




1




1




$begingroup$
@PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
$endgroup$
– Nick Cox
Feb 17 at 16:18






$begingroup$
@PeterLeopold "Pearson skewness" is not uniquely defined. Pearson himself put most emphasis on measuring skewness relative to the mode, which had a major role in his system of distributions. But he did also use a dimensionless ratio based on third and second moments around the mean. And yet again (mean $-$ median) / SD appears in his work. But regardless I wouldn't regard skewness of about 0.1 on any measure I've encountered as requiring transformation. I would always want to see the data, however.
$endgroup$
– Nick Cox
Feb 17 at 16:18














$begingroup$
@Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
$endgroup$
– Peter Flom
Feb 17 at 16:18




$begingroup$
@Peter The problem here is more about kurtosis than skewness and it's not clear that taking the log is justified, even if it were skew. It depends on what the OP is going to use the data for.
$endgroup$
– Peter Flom
Feb 17 at 16:18












$begingroup$
I would not trust a scale for happiness [NB] with such results!
$endgroup$
– Nick Cox
Feb 17 at 16:20




$begingroup$
I would not trust a scale for happiness [NB] with such results!
$endgroup$
– Nick Cox
Feb 17 at 16:20










1 Answer
1






active

oldest

votes


















2












$begingroup$

First, the density plot does not really look normal. It's symmetric, but the shape is wrong. I suggest generating a normal distribution with the same mean and variance as yours and then overlaying that density on the one you've got. I am fairly sure you will see a mismatch.



Second, a quantile normal plot is often a better clue to nonnormality.



Third, and probably most importantly, why are you concerned about the normality of this variable? What are you going to do with the variable?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
    $endgroup$
    – AnaHochma
    Feb 17 at 18:42










  • $begingroup$
    I'd go with Wilcoxon. Or maybe a bootstrap.
    $endgroup$
    – Peter Flom
    Feb 17 at 19:08










  • $begingroup$
    Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
    $endgroup$
    – Peter Leopold
    Feb 17 at 19:15






  • 1




    $begingroup$
    Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
    $endgroup$
    – AnaHochma
    Feb 17 at 19:22














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f392949%2fdensity-plot-looks-normal-qqplot-not-normal-shapiro-significative%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

First, the density plot does not really look normal. It's symmetric, but the shape is wrong. I suggest generating a normal distribution with the same mean and variance as yours and then overlaying that density on the one you've got. I am fairly sure you will see a mismatch.



Second, a quantile normal plot is often a better clue to nonnormality.



Third, and probably most importantly, why are you concerned about the normality of this variable? What are you going to do with the variable?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
    $endgroup$
    – AnaHochma
    Feb 17 at 18:42










  • $begingroup$
    I'd go with Wilcoxon. Or maybe a bootstrap.
    $endgroup$
    – Peter Flom
    Feb 17 at 19:08










  • $begingroup$
    Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
    $endgroup$
    – Peter Leopold
    Feb 17 at 19:15






  • 1




    $begingroup$
    Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
    $endgroup$
    – AnaHochma
    Feb 17 at 19:22


















2












$begingroup$

First, the density plot does not really look normal. It's symmetric, but the shape is wrong. I suggest generating a normal distribution with the same mean and variance as yours and then overlaying that density on the one you've got. I am fairly sure you will see a mismatch.



Second, a quantile normal plot is often a better clue to nonnormality.



Third, and probably most importantly, why are you concerned about the normality of this variable? What are you going to do with the variable?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
    $endgroup$
    – AnaHochma
    Feb 17 at 18:42










  • $begingroup$
    I'd go with Wilcoxon. Or maybe a bootstrap.
    $endgroup$
    – Peter Flom
    Feb 17 at 19:08










  • $begingroup$
    Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
    $endgroup$
    – Peter Leopold
    Feb 17 at 19:15






  • 1




    $begingroup$
    Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
    $endgroup$
    – AnaHochma
    Feb 17 at 19:22
















2












2








2





$begingroup$

First, the density plot does not really look normal. It's symmetric, but the shape is wrong. I suggest generating a normal distribution with the same mean and variance as yours and then overlaying that density on the one you've got. I am fairly sure you will see a mismatch.



Second, a quantile normal plot is often a better clue to nonnormality.



Third, and probably most importantly, why are you concerned about the normality of this variable? What are you going to do with the variable?






share|cite|improve this answer









$endgroup$



First, the density plot does not really look normal. It's symmetric, but the shape is wrong. I suggest generating a normal distribution with the same mean and variance as yours and then overlaying that density on the one you've got. I am fairly sure you will see a mismatch.



Second, a quantile normal plot is often a better clue to nonnormality.



Third, and probably most importantly, why are you concerned about the normality of this variable? What are you going to do with the variable?







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Feb 17 at 16:15









Peter FlomPeter Flom

77.6k12110219




77.6k12110219








  • 1




    $begingroup$
    Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
    $endgroup$
    – AnaHochma
    Feb 17 at 18:42










  • $begingroup$
    I'd go with Wilcoxon. Or maybe a bootstrap.
    $endgroup$
    – Peter Flom
    Feb 17 at 19:08










  • $begingroup$
    Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
    $endgroup$
    – Peter Leopold
    Feb 17 at 19:15






  • 1




    $begingroup$
    Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
    $endgroup$
    – AnaHochma
    Feb 17 at 19:22
















  • 1




    $begingroup$
    Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
    $endgroup$
    – AnaHochma
    Feb 17 at 18:42










  • $begingroup$
    I'd go with Wilcoxon. Or maybe a bootstrap.
    $endgroup$
    – Peter Flom
    Feb 17 at 19:08










  • $begingroup$
    Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
    $endgroup$
    – Peter Leopold
    Feb 17 at 19:15






  • 1




    $begingroup$
    Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
    $endgroup$
    – AnaHochma
    Feb 17 at 19:22










1




1




$begingroup$
Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
$endgroup$
– AnaHochma
Feb 17 at 18:42




$begingroup$
Thanks for all comments and answers, @Peter Flom I've got measures of "Happyness" for two groups of people, I plotted them against time and looks like one group gets higher values so I'm trying to statistically compare them, I run Shapiro test for both groups and got p value << 0.05 so I don't know if a t-test or Wilcoxon
$endgroup$
– AnaHochma
Feb 17 at 18:42












$begingroup$
I'd go with Wilcoxon. Or maybe a bootstrap.
$endgroup$
– Peter Flom
Feb 17 at 19:08




$begingroup$
I'd go with Wilcoxon. Or maybe a bootstrap.
$endgroup$
– Peter Flom
Feb 17 at 19:08












$begingroup$
Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
$endgroup$
– Peter Leopold
Feb 17 at 19:15




$begingroup$
Ah, two distributions! Thanks for confirming what the data was hinting strongly at.
$endgroup$
– Peter Leopold
Feb 17 at 19:15




1




1




$begingroup$
Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
$endgroup$
– AnaHochma
Feb 17 at 19:22






$begingroup$
Thanks! so I'll go with Wilcoxon too :) the main reason is because the Shapiro test << 0.05?
$endgroup$
– AnaHochma
Feb 17 at 19:22




















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f392949%2fdensity-plot-looks-normal-qqplot-not-normal-shapiro-significative%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

How to send String Array data to Server using php in android

Title Spacing in Bjornstrup Chapter, Removing Chapter Number From Contents

Is anime1.com a legal site for watching anime?