Count data divided by year and by region in R

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

such that each row represents a region, and each column represents the count of each species, in a particular year. I've tried to do this using the spread function in conjunction with the group_by dplyr function, but I couldn't get it to do anything close to what I want.

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

add a comment |

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

add a comment |

up vote
8
down vote

favorite

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

I have a very large (too big to open in Excel) biological dataset that looks something like this

    year <- c(1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,1990, 

              1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985,

              1990, 1980, 1985, 1980, 1990, 1990, 1980, 1985, 1985)

    species <- c('A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'A','A', 'A', 

                 'B', 'B', 'B', 'C', 'C', 'C', 'A', 'A', 'A', 'B', 'B', 'B', 

                 'C', 'C', 'C', 'A')

    region <- c(1, 1, 1, 3, 2, 3, 3, 2, 1, 1, 3, 3, 3, 2, 2, 1, 1, 1,1, 3, 3, 

                3, 2, 2, 1, 1, 1)

    df <- data.frame(year, species, region)



    df

    year species region

 1  1990       A      1

 2  1980       A      1

 3  1985       B      1

 4  1980       B      3

 5  1990       B      2

 6  1990       C      3

 7  1980       C      3

 8  1985       C      2

 9  1985       A      1

 10 1990       A      1

 11 1980       A      3

 12 1985       B      3

 13 1980       B      3

 14 1990       B      2

 15 1990       C      2

 16 1980       C      1

 17 1985       C      1

 18 1985       A      1

 19 1990       A      1

 20 1980       A      3

 21 1985       B      3

 22 1980       B      3

 23 1990       B      2

 24 1990       C      2

 25 1980       C      1

 26 1985       C      1

 27 1985       A      1

What I am looking to do is figure out how many of each species (A, B, or C) exist in each region (1, 2, or 3) in each of the three years I have (1980, 1985, or 1990).

I'm looking to end up with a dataset that looks something along the lines of this,

      region A_1980 B_1980 C_1980 A_1985 B_1985 C_1985 A_1990 B_1990 C_1990

 1      1      0      0      0      0      0      0      0      0      0

 2      2      1      1      1      1      1      1      1      1      1

 3      3      2      2      2      2      2      2      2      2      2

Does anyone have any suggestions?

r grouping tidyverse data-management

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

edited Nov 18 at 0:54

m0nhawk

14.9k83160

edited Nov 18 at 0:54

m0nhawk

14.9k83160

edited Nov 18 at 0:54

m0nhawk

14.9k83160

asked Nov 18 at 0:34

cb14

434

New contributor

asked Nov 18 at 0:34

cb14

434

asked Nov 18 at 0:34

cb14

434

New contributor

cb14 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
5
down vote

Similar to wl1234's answer but more concise. We can use unite to combine columns. We can also use count without group_by the variable. Finally, we can set fill = 0 in the spread function to replace NA with 0.

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53356871%2fcount-data-divided-by-year-and-by-region-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
10
down vote

accepted

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

183211

Something like this?

library(dplyr)



df2 <- df %>% 

  mutate(sp_year = paste(species, year, sep = "_")) %>%

  group_by(region) %>% 

  count(sp_year) %>% 

  spread(sp_year,n)



df2

Which gives this:

# A tibble: 3 x 10

# Groups:   region [3]

  region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

   <dbl>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>

1      1      1      3      3     NA      1     NA      2      2     NA

2      2     NA     NA     NA     NA     NA      3     NA      1      2

3      3      2     NA     NA      3      2     NA      1     NA      1

answered Nov 18 at 0:59

wl1234

183211

answered Nov 18 at 0:59

wl1234

183211

answered Nov 18 at 0:59

wl1234

183211

answered Nov 18 at 0:59

wl1234

183211

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

1

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

also possible to use ?tidyr::unite instead of mutate(paste). Would be less verbose at the very least.
– Shree
Nov 18 at 1:36

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

up vote
5
down vote

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

library(tidyverse)



df2 <- df %>%

  unite(sp_year, species, year, sep = "_") %>%

  count(sp_year, region) %>%

  spread(sp_year, n, fill = 0)

df2

# # A tibble: 3 x 10

#   region A_1980 A_1985 A_1990 B_1980 B_1985 B_1990 C_1980 C_1985 C_1990

#    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>

# 1      1      1      3      3      0      1      0      2      2      0

# 2      2      0      0      0      0      0      3      0      1      2

# 3      3      2      0      0      3      2      0      1      0      1

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

edited Nov 18 at 1:40

answered Nov 18 at 1:35

www

25.4k102239

answered Nov 18 at 1:35

www

25.4k102239

answered Nov 18 at 1:35

www

25.4k102239

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

1

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

This is awesome, and I love the NA => 0 addition as well! Thank you!
– cb14
Nov 18 at 1:53

I didn't know about unite. I will use that instead of paste next time.
– wl1234
Nov 18 at 3:45

add a comment |

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

cb14 is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Cfrgtkky