How to split dataframe in R to conduct statistical tests? [closed]

-1

I have 3 variables, A1, A2 and A3

A1 is temperature

A2 is month

A3 is location

A2 has 2 months - March and May.A3 has 2 cities - Chennai and Dubai.

but when I do a correlation between A1 and A3:

cor(A1,A3, method = "pearson") 'y' must be numeric

How can I fix this, please?

Many Thanks,
Ishack

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

closed as off-topic by jogo, Andre Elrico, Rui Barradas, Sven Hohenstein, phiver Nov 19 '18 at 17:34

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – jogo, Sven Hohenstein, phiver

If this question can be reworded to fit the rules in the help center, please edit the question.

1

Welcome to SO! Please read How to Ask and give a Minimal, Complete, and Verifiable example in your question!

– jogo
Nov 19 '18 at 12:47

2

You cannot correlate nominal/ dichotomous data with continuous. You could do a ?t.test or ?wilcox.test for A1 vs A2 and A1 vs A3. Please read and watch youtube videos on "correlation" and the two tests I mentioned.

– Andre Elrico
Nov 19 '18 at 12:50

Maybe the accepted answer to Correlations with unordered categorical variables will help.

– Rui Barradas
Nov 19 '18 at 12:56

Can I use t.test instead of the correlation?

– Ishack Marshook
Nov 19 '18 at 14:01

A t-test would test the null hypothesis that the average temperatures between the two cities are equal. A paired t-test would test the null hypothesis that pairs of temperature readings taken at the same time are equal. A correlation would explain the degree to which increases in temperature in Chennai are associated with increases in temperature in Dubai. Which hypothesis do you wish to test?

– Len Greski
Nov 19 '18 at 17:15

add a comment |

-1

I have 3 variables, A1, A2 and A3

A1 is temperature

A2 is month

A3 is location

A2 has 2 months - March and May.A3 has 2 cities - Chennai and Dubai.

but when I do a correlation between A1 and A3:

cor(A1,A3, method = "pearson") 'y' must be numeric

How can I fix this, please?

Many Thanks,
Ishack

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

closed as off-topic by jogo, Andre Elrico, Rui Barradas, Sven Hohenstein, phiver Nov 19 '18 at 17:34

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – jogo, Sven Hohenstein, phiver

If this question can be reworded to fit the rules in the help center, please edit the question.

1

Welcome to SO! Please read How to Ask and give a Minimal, Complete, and Verifiable example in your question!

– jogo
Nov 19 '18 at 12:47

2

You cannot correlate nominal/ dichotomous data with continuous. You could do a ?t.test or ?wilcox.test for A1 vs A2 and A1 vs A3. Please read and watch youtube videos on "correlation" and the two tests I mentioned.

– Andre Elrico
Nov 19 '18 at 12:50

Maybe the accepted answer to Correlations with unordered categorical variables will help.

– Rui Barradas
Nov 19 '18 at 12:56

Can I use t.test instead of the correlation?

– Ishack Marshook
Nov 19 '18 at 14:01

A t-test would test the null hypothesis that the average temperatures between the two cities are equal. A paired t-test would test the null hypothesis that pairs of temperature readings taken at the same time are equal. A correlation would explain the degree to which increases in temperature in Chennai are associated with increases in temperature in Dubai. Which hypothesis do you wish to test?

– Len Greski
Nov 19 '18 at 17:15

add a comment |

-1

I have 3 variables, A1, A2 and A3

A1 is temperature

A2 is month

A3 is location

A2 has 2 months - March and May.A3 has 2 cities - Chennai and Dubai.

but when I do a correlation between A1 and A3:

cor(A1,A3, method = "pearson") 'y' must be numeric

How can I fix this, please?

Many Thanks,
Ishack

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

I have 3 variables, A1, A2 and A3

A1 is temperature

A2 is month

A3 is location

A2 has 2 months - March and May.A3 has 2 cities - Chennai and Dubai.

but when I do a correlation between A1 and A3:

cor(A1,A3, method = "pearson") 'y' must be numeric

How can I fix this, please?

Many Thanks,
Ishack

r dataframe split correlation

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

edited Nov 19 '18 at 13:26

Ned

1,0801422

edited Nov 19 '18 at 13:26

Ned

1,0801422

edited Nov 19 '18 at 13:26

Ned

1,0801422

asked Nov 19 '18 at 12:32

Ishack Marshook

asked Nov 19 '18 at 12:32

Ishack Marshook

asked Nov 19 '18 at 12:32

Ishack Marshook

closed as off-topic by jogo, Andre Elrico, Rui Barradas, Sven Hohenstein, phiver Nov 19 '18 at 17:34

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – jogo, Sven Hohenstein, phiver

If this question can be reworded to fit the rules in the help center, please edit the question.

closed as off-topic by jogo, Andre Elrico, Rui Barradas, Sven Hohenstein, phiver Nov 19 '18 at 17:34

This question appears to be off-topic. The users who voted to close gave this specific reason:

"Questions seeking debugging help ("why isn't this code working?") must include the desired behavior, a specific problem or error and the shortest code necessary to reproduce it in the question itself. Questions without a clear problem statement are not useful to other readers. See: How to create a Minimal, Complete, and Verifiable example." – jogo, Sven Hohenstein, phiver

If this question can be reworded to fit the rules in the help center, please edit the question.

1

Welcome to SO! Please read How to Ask and give a Minimal, Complete, and Verifiable example in your question!

– jogo
Nov 19 '18 at 12:47

2

You cannot correlate nominal/ dichotomous data with continuous. You could do a ?t.test or ?wilcox.test for A1 vs A2 and A1 vs A3. Please read and watch youtube videos on "correlation" and the two tests I mentioned.

– Andre Elrico
Nov 19 '18 at 12:50

Maybe the accepted answer to Correlations with unordered categorical variables will help.

– Rui Barradas
Nov 19 '18 at 12:56

Can I use t.test instead of the correlation?

– Ishack Marshook
Nov 19 '18 at 14:01

A t-test would test the null hypothesis that the average temperatures between the two cities are equal. A paired t-test would test the null hypothesis that pairs of temperature readings taken at the same time are equal. A correlation would explain the degree to which increases in temperature in Chennai are associated with increases in temperature in Dubai. Which hypothesis do you wish to test?

– Len Greski
Nov 19 '18 at 17:15

add a comment |

1

Welcome to SO! Please read How to Ask and give a Minimal, Complete, and Verifiable example in your question!

– jogo
Nov 19 '18 at 12:47

2

You cannot correlate nominal/ dichotomous data with continuous. You could do a ?t.test or ?wilcox.test for A1 vs A2 and A1 vs A3. Please read and watch youtube videos on "correlation" and the two tests I mentioned.

– Andre Elrico
Nov 19 '18 at 12:50

Maybe the accepted answer to Correlations with unordered categorical variables will help.

– Rui Barradas
Nov 19 '18 at 12:56

Can I use t.test instead of the correlation?

– Ishack Marshook
Nov 19 '18 at 14:01

A t-test would test the null hypothesis that the average temperatures between the two cities are equal. A paired t-test would test the null hypothesis that pairs of temperature readings taken at the same time are equal. A correlation would explain the degree to which increases in temperature in Chennai are associated with increases in temperature in Dubai. Which hypothesis do you wish to test?

– Len Greski
Nov 19 '18 at 17:15

Welcome to SO! Please read How to Ask and give a Minimal, Complete, and Verifiable example in your question!

– jogo
Nov 19 '18 at 12:47

You cannot correlate nominal/ dichotomous data with continuous. You could do a ?t.test or ?wilcox.test for A1 vs A2 and A1 vs A3. Please read and watch youtube videos on "correlation" and the two tests I mentioned.

– Andre Elrico
Nov 19 '18 at 12:50

Maybe the accepted answer to Correlations with unordered categorical variables will help.

– Rui Barradas
Nov 19 '18 at 12:56

Can I use t.test instead of the correlation?

– Ishack Marshook
Nov 19 '18 at 14:01

A t-test would test the null hypothesis that the average temperatures between the two cities are equal. A paired t-test would test the null hypothesis that pairs of temperature readings taken at the same time are equal. A correlation would explain the degree to which increases in temperature in Chennai are associated with increases in temperature in Dubai. Which hypothesis do you wish to test?

– Len Greski
Nov 19 '18 at 17:15

add a comment |

1 Answer
1

active

oldest

votes

There are many ways to split the data, but the first question to answer is "what hypothesis do I wish to test?"

Here is example code using average daily high temperatures in Chennai and Dubai from timeanddate.com

# data collected from average high temperatures collected from 2005 - 2015

# https://www.timeanddate.com/weather/india/chennai/climate

# https://www.timeanddate.com/weather/united-arab-emirates/dubai/climate

rawData <- "

temperature,month,city

75,Jan,Dubai

78,Feb,Dubai

83,Mar,Dubai

92,Apr,Dubai

100,May,Dubai

103,Jun,Dubai

106,Jul,Dubai

107,Aug,Dubai

102,Sep,Dubai

96,Oct,Dubai

87,Nov,Dubai

79,Dec,Dubai

86,Jan,Chennai

89,Feb,Chennai

93,Mar,Chennai

97,Apr,Chennai

102,May,Chennai

100,Jun,Chennai

97,Jul,Chennai

95,Aug,Chennai

95,Sep,Chennai

92,Oct,Chennai

87,Nov,Chennai

86,Dec,Chennai"



tempData <- read.csv(text=rawData)



# t-test for average temperatures

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=FALSE)



# paired t-test

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=TRUE)

# correlation

cor(tempData[tempData$city =="Dubai","temperature"],

    tempData[tempData$city =="Chennai","temperature"])

Two Sample t-test

The two sample t-test tests the null hypothesis that the two means are equal, irrespective of the association between data collected between the two groups in the test. Sometimes the association between two groups may be based on time (as in the case of the temperature data), but the pairing may be based on other characteristics (e.g. twins in a study that has test and control groups where each pair of twins is randomly assigned to test and control groups).

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=FALSE)



    Welch Two Sample t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.24817, df = 15.546, p-value = 0.8073

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -8.765568  6.932235

sample estimates:

mean of x mean of y 

 92.33333  93.25000

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai.

Paired t-test

The paired t-test calculates the difference between the pairs of observations and tests the null hypothesis that the average difference is 0.

> # paired t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=TRUE)



    Paired t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.39555, df = 11, p-value = 0.7

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -6.017343  4.184009

sample estimates:

mean of the differences 

             -0.9166667

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai, when the test is conducted on the differences in pairs of monthly average high temperature values.

Correlation

The Pearson correlation measures the strength of the linear relationship between two variables, -1.0 = perfect negative correlation, 0 = no linear correlation, and 1 = perfect positive correlation.

> cor(tempData[tempData$city =="Dubai","temperature"],

+     tempData[tempData$city =="Chennai","temperature"])

[1] 0.7929018

>

A correlation of 0.79 indicates a strong positive linear relationship between monthly average high temperatures in Dubai and Chennai.

Technique used to split the data

Since I created a raw data file and loaded it into R with read.csv(), I used the [ form of the extract operator to extract rows based on the value of the city column. I also created the raw data file in monthly order for each city, so the order of values in each subset matches by month, enabling a straightforward use of the pairwise t-test.

# extract temperature values for Dubai

tempData[tempData$city =="Dubai","temperature"]

A wide variety of techniques can be used to subset data from an R data frame, such as the which() function and the sqldf() function.

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

add a comment |

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

There are many ways to split the data, but the first question to answer is "what hypothesis do I wish to test?"

Here is example code using average daily high temperatures in Chennai and Dubai from timeanddate.com

# data collected from average high temperatures collected from 2005 - 2015

# https://www.timeanddate.com/weather/india/chennai/climate

# https://www.timeanddate.com/weather/united-arab-emirates/dubai/climate

rawData <- "

temperature,month,city

75,Jan,Dubai

78,Feb,Dubai

83,Mar,Dubai

92,Apr,Dubai

100,May,Dubai

103,Jun,Dubai

106,Jul,Dubai

107,Aug,Dubai

102,Sep,Dubai

96,Oct,Dubai

87,Nov,Dubai

79,Dec,Dubai

86,Jan,Chennai

89,Feb,Chennai

93,Mar,Chennai

97,Apr,Chennai

102,May,Chennai

100,Jun,Chennai

97,Jul,Chennai

95,Aug,Chennai

95,Sep,Chennai

92,Oct,Chennai

87,Nov,Chennai

86,Dec,Chennai"



tempData <- read.csv(text=rawData)



# t-test for average temperatures

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=FALSE)



# paired t-test

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=TRUE)

# correlation

cor(tempData[tempData$city =="Dubai","temperature"],

    tempData[tempData$city =="Chennai","temperature"])

Two Sample t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=FALSE)



    Welch Two Sample t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.24817, df = 15.546, p-value = 0.8073

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -8.765568  6.932235

sample estimates:

mean of x mean of y 

 92.33333  93.25000

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai.

Paired t-test

The paired t-test calculates the difference between the pairs of observations and tests the null hypothesis that the average difference is 0.

> # paired t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=TRUE)



    Paired t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.39555, df = 11, p-value = 0.7

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -6.017343  4.184009

sample estimates:

mean of the differences 

             -0.9166667

Correlation

The Pearson correlation measures the strength of the linear relationship between two variables, -1.0 = perfect negative correlation, 0 = no linear correlation, and 1 = perfect positive correlation.

> cor(tempData[tempData$city =="Dubai","temperature"],

+     tempData[tempData$city =="Chennai","temperature"])

[1] 0.7929018

>

A correlation of 0.79 indicates a strong positive linear relationship between monthly average high temperatures in Dubai and Chennai.

Technique used to split the data

# extract temperature values for Dubai

tempData[tempData$city =="Dubai","temperature"]

A wide variety of techniques can be used to subset data from an R data frame, such as the which() function and the sqldf() function.

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

add a comment |

There are many ways to split the data, but the first question to answer is "what hypothesis do I wish to test?"

Here is example code using average daily high temperatures in Chennai and Dubai from timeanddate.com

# data collected from average high temperatures collected from 2005 - 2015

# https://www.timeanddate.com/weather/india/chennai/climate

# https://www.timeanddate.com/weather/united-arab-emirates/dubai/climate

rawData <- "

temperature,month,city

75,Jan,Dubai

78,Feb,Dubai

83,Mar,Dubai

92,Apr,Dubai

100,May,Dubai

103,Jun,Dubai

106,Jul,Dubai

107,Aug,Dubai

102,Sep,Dubai

96,Oct,Dubai

87,Nov,Dubai

79,Dec,Dubai

86,Jan,Chennai

89,Feb,Chennai

93,Mar,Chennai

97,Apr,Chennai

102,May,Chennai

100,Jun,Chennai

97,Jul,Chennai

95,Aug,Chennai

95,Sep,Chennai

92,Oct,Chennai

87,Nov,Chennai

86,Dec,Chennai"



tempData <- read.csv(text=rawData)



# t-test for average temperatures

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=FALSE)



# paired t-test

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=TRUE)

# correlation

cor(tempData[tempData$city =="Dubai","temperature"],

    tempData[tempData$city =="Chennai","temperature"])

Two Sample t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=FALSE)



    Welch Two Sample t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.24817, df = 15.546, p-value = 0.8073

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -8.765568  6.932235

sample estimates:

mean of x mean of y 

 92.33333  93.25000

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai.

Paired t-test

The paired t-test calculates the difference between the pairs of observations and tests the null hypothesis that the average difference is 0.

> # paired t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=TRUE)



    Paired t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.39555, df = 11, p-value = 0.7

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -6.017343  4.184009

sample estimates:

mean of the differences 

             -0.9166667

Correlation

The Pearson correlation measures the strength of the linear relationship between two variables, -1.0 = perfect negative correlation, 0 = no linear correlation, and 1 = perfect positive correlation.

> cor(tempData[tempData$city =="Dubai","temperature"],

+     tempData[tempData$city =="Chennai","temperature"])

[1] 0.7929018

>

A correlation of 0.79 indicates a strong positive linear relationship between monthly average high temperatures in Dubai and Chennai.

Technique used to split the data

# extract temperature values for Dubai

tempData[tempData$city =="Dubai","temperature"]

A wide variety of techniques can be used to subset data from an R data frame, such as the which() function and the sqldf() function.

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

add a comment |

There are many ways to split the data, but the first question to answer is "what hypothesis do I wish to test?"

Here is example code using average daily high temperatures in Chennai and Dubai from timeanddate.com

# data collected from average high temperatures collected from 2005 - 2015

# https://www.timeanddate.com/weather/india/chennai/climate

# https://www.timeanddate.com/weather/united-arab-emirates/dubai/climate

rawData <- "

temperature,month,city

75,Jan,Dubai

78,Feb,Dubai

83,Mar,Dubai

92,Apr,Dubai

100,May,Dubai

103,Jun,Dubai

106,Jul,Dubai

107,Aug,Dubai

102,Sep,Dubai

96,Oct,Dubai

87,Nov,Dubai

79,Dec,Dubai

86,Jan,Chennai

89,Feb,Chennai

93,Mar,Chennai

97,Apr,Chennai

102,May,Chennai

100,Jun,Chennai

97,Jul,Chennai

95,Aug,Chennai

95,Sep,Chennai

92,Oct,Chennai

87,Nov,Chennai

86,Dec,Chennai"



tempData <- read.csv(text=rawData)



# t-test for average temperatures

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=FALSE)



# paired t-test

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=TRUE)

# correlation

cor(tempData[tempData$city =="Dubai","temperature"],

    tempData[tempData$city =="Chennai","temperature"])

Two Sample t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=FALSE)



    Welch Two Sample t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.24817, df = 15.546, p-value = 0.8073

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -8.765568  6.932235

sample estimates:

mean of x mean of y 

 92.33333  93.25000

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai.

Paired t-test

The paired t-test calculates the difference between the pairs of observations and tests the null hypothesis that the average difference is 0.

> # paired t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=TRUE)



    Paired t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.39555, df = 11, p-value = 0.7

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -6.017343  4.184009

sample estimates:

mean of the differences 

             -0.9166667

Correlation

The Pearson correlation measures the strength of the linear relationship between two variables, -1.0 = perfect negative correlation, 0 = no linear correlation, and 1 = perfect positive correlation.

> cor(tempData[tempData$city =="Dubai","temperature"],

+     tempData[tempData$city =="Chennai","temperature"])

[1] 0.7929018

>

A correlation of 0.79 indicates a strong positive linear relationship between monthly average high temperatures in Dubai and Chennai.

Technique used to split the data

# extract temperature values for Dubai

tempData[tempData$city =="Dubai","temperature"]

A wide variety of techniques can be used to subset data from an R data frame, such as the which() function and the sqldf() function.

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

There are many ways to split the data, but the first question to answer is "what hypothesis do I wish to test?"

Here is example code using average daily high temperatures in Chennai and Dubai from timeanddate.com

# data collected from average high temperatures collected from 2005 - 2015

# https://www.timeanddate.com/weather/india/chennai/climate

# https://www.timeanddate.com/weather/united-arab-emirates/dubai/climate

rawData <- "

temperature,month,city

75,Jan,Dubai

78,Feb,Dubai

83,Mar,Dubai

92,Apr,Dubai

100,May,Dubai

103,Jun,Dubai

106,Jul,Dubai

107,Aug,Dubai

102,Sep,Dubai

96,Oct,Dubai

87,Nov,Dubai

79,Dec,Dubai

86,Jan,Chennai

89,Feb,Chennai

93,Mar,Chennai

97,Apr,Chennai

102,May,Chennai

100,Jun,Chennai

97,Jul,Chennai

95,Aug,Chennai

95,Sep,Chennai

92,Oct,Chennai

87,Nov,Chennai

86,Dec,Chennai"



tempData <- read.csv(text=rawData)



# t-test for average temperatures

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=FALSE)



# paired t-test

t.test(tempData[tempData$city =="Dubai","temperature"],

       tempData[tempData$city == "Chennai","temperature"],

       paired=TRUE)

# correlation

cor(tempData[tempData$city =="Dubai","temperature"],

    tempData[tempData$city =="Chennai","temperature"])

Two Sample t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=FALSE)



    Welch Two Sample t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.24817, df = 15.546, p-value = 0.8073

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -8.765568  6.932235

sample estimates:

mean of x mean of y 

 92.33333  93.25000

Since 0 is within the 95% confidence interval, we accept the null hypothesis that there is no difference in the monthly average high temperatures between Chennai and Dubai.

Paired t-test

The paired t-test calculates the difference between the pairs of observations and tests the null hypothesis that the average difference is 0.

> # paired t-test

> t.test(tempData[tempData$city =="Dubai","temperature"],

+        tempData[tempData$city == "Chennai","temperature"],

+        paired=TRUE)



    Paired t-test



data:  tempData[tempData$city == "Dubai", "temperature"] and tempData[tempData$city == "Chennai", "temperature"]

t = -0.39555, df = 11, p-value = 0.7

alternative hypothesis: true difference in means is not equal to 0

95 percent confidence interval:

 -6.017343  4.184009

sample estimates:

mean of the differences 

             -0.9166667

Correlation

The Pearson correlation measures the strength of the linear relationship between two variables, -1.0 = perfect negative correlation, 0 = no linear correlation, and 1 = perfect positive correlation.

> cor(tempData[tempData$city =="Dubai","temperature"],

+     tempData[tempData$city =="Chennai","temperature"])

[1] 0.7929018

>

A correlation of 0.79 indicates a strong positive linear relationship between monthly average high temperatures in Dubai and Chennai.

Technique used to split the data

# extract temperature values for Dubai

tempData[tempData$city =="Dubai","temperature"]

A wide variety of techniques can be used to subset data from an R data frame, such as the which() function and the sqldf() function.

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

edited Nov 20 '18 at 0:58

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

answered Nov 19 '18 at 17:28

Len Greski

3,1401421

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

add a comment |

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

Thanks a lot Len.

– Ishack Marshook
Nov 20 '18 at 13:25

@IshackMarshook please accept the answer if you found it to be helpful.

– Len Greski
Nov 20 '18 at 13:26

done, Len. Can i ask a few more questions please?

– Ishack Marshook
Nov 20 '18 at 13:28

@IshackMarshook - if your followup questions are directly related to this one, yes. Otherwise, please post a new question and the SO community will answer it. Part of SO etiquette is that a post focuses on a single question.

– Len Greski
Nov 20 '18 at 15:42

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky