ggalluvial: How do I plot an alluvial diagram when I have a dataframe with links and nodes?
I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)
r ggplot2 sankey-diagram
|
show 3 more comments
I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)
r ggplot2 sankey-diagram
2
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
1
Do you basically want to be able to useposition = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before callingggplot()
to achieve this.
– Cory Brunson
Nov 22 '18 at 22:02
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.
– Cory Brunson
Nov 23 '18 at 18:51
|
show 3 more comments
I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)
r ggplot2 sankey-diagram
I have this dataframe with timepoints (a, b and c), labels (l1, l2, l3) and frequencies that are distributed over the timepoints and labels.
I want to create a sankey diagram with the ggalluvial package in R.
Here's some code:
library(tidyverse)
library(forcats)
library(ggalluvial)
library(magrittr)
plotAlluvial <- function(.df,name=freq) {
y_name <- enquo(name)
ggplot(.df,
aes(
x = tp,
stratum = lbl,
alluvium = id,
label=lbl,
fill = lbl,
y=!!y_name
)
) +
geom_stratum() +
geom_flow(stat = "flow", color = "darkgray") +
geom_text(stat = "stratum") +
scale_fill_brewer(type = "qual", palette = "Set2")
}
x1=c(6,0,0,5,5,4,2,0,3)
x2=c(5,5,3,0,0,5,0,7,0)
df=data_frame(tp1=rep(c('a','b'),each=9),
lbl1=c(rep(c('l1','l2','l3'),2,each=3)),
tp2=rep(c('b','c'),each=9),
lbl2=c(rep(c('l1','l2','l3'),6)),
freq=c(x1,x2)
)
df2=df %>%
mutate(id=row_number()) %>%
unite(un1,c(tp1,lbl1)) %>%
unite(un2,c(tp2,lbl2)) %>%
tidyr::gather(key,value,-c(freq,id)) %>%
separate('value',c('tp','lbl'))
df2.left= df2 %>%
dplyr::filter(!(key=='un1' & tp=='b'))
df2.right= df2 %>%
dplyr::filter(!(key=='un2' & tp=='b'))
I can plot the left side and plot the right side of the diagram I want:
plotAlluvial(df2.left)
plotAlluvial(df2.right)
But if I try to plot the left and right side at the same time I get this plot:
plotAlluvial(df2)
When I use the code above, the plot of the diagram has too many frequencies at timepoint b. The stratum should be as high as the other two stratums so have a height of 25.
What am I doing wrong? How can I create a diagram that combines the first two plots?
EDIT:
After a comment I added a proportion of the frequencies variable. Now the stratum b is of the correct height but the incoming and outgoing flows still only occupy 50% of each condition in timepoint b.
df2 %<>% group_by(tp) %>% mutate(prop = freq / sum(freq)) %>%
ungroup()
plotAlluvial(df2,prop)
r ggplot2 sankey-diagram
r ggplot2 sankey-diagram
edited Nov 26 '18 at 9:21
Robert
asked Nov 21 '18 at 13:40
RobertRobert
4111517
4111517
2
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
1
Do you basically want to be able to useposition = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before callingggplot()
to achieve this.
– Cory Brunson
Nov 22 '18 at 22:02
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.
– Cory Brunson
Nov 23 '18 at 18:51
|
show 3 more comments
2
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
1
Do you basically want to be able to useposition = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before callingggplot()
to achieve this.
– Cory Brunson
Nov 22 '18 at 22:02
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.
– Cory Brunson
Nov 23 '18 at 18:51
2
2
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
1
1
Do you basically want to be able to use
position = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot()
to achieve this.– Cory Brunson
Nov 22 '18 at 22:02
Do you basically want to be able to use
position = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before calling ggplot()
to achieve this.– Cory Brunson
Nov 22 '18 at 22:02
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:
tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.– Cory Brunson
Nov 23 '18 at 18:51
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:
tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.– Cory Brunson
Nov 23 '18 at 18:51
|
show 3 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413351%2fggalluvial-how-do-i-plot-an-alluvial-diagram-when-i-have-a-dataframe-with-links%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53413351%2fggalluvial-how-do-i-plot-an-alluvial-diagram-when-i-have-a-dataframe-with-links%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Could you please provide a desired diagram? Unfortunately I cannot understand what wrong with your output
– atsyplenkov
Nov 21 '18 at 16:38
Sorry I can't. But I want the the timepoint b stratum as high as the other two stratums. So freq should also be 25.
– Robert
Nov 21 '18 at 19:38
1
Do you basically want to be able to use
position = "fill"
(as documented here) with the ggalluvial geoms, and plot percentages rather than values? This isn't possible at present; you'd need to transform the data before callingggplot()
to achieve this.– Cory Brunson
Nov 22 '18 at 22:02
Thanks Cory! It looks indeed like the thing I want. Do you have any idea how I should transform the data above so I can make the diagram I want? I have changed my question and added more pictures to better explain what I want.
– Robert
Nov 23 '18 at 8:32
@Robert, on closer inspection, i seems like you might be mis-encoding your data. Are you trying to track the distribution of 25 subjects over three labels along three time points? In that case, here's an artificial data frame of the correct (long form) structure:
tidyr::crossing(sub = 1:25, tp = letters[1:3]) %>% mutate(lbl = sample(paste0("l", 1:3), 25 * 3, replace = TRUE))
.– Cory Brunson
Nov 23 '18 at 18:51