Cfrgtkky

Question

First and foremost, thank you for taking your time to view/answer my question.

I asked this question before, albeit very un-clearly, however, I think I am close to a solution and hoping you can help me out!

I have 2 main df's: Customer (base table), and Top_Customers (a subset of the top n customers per quarter)

Both tables have the same layout, seen below:

Cust_ID   Date    QTR     Sales    Action   Link_Cust_ID

  1      1/1/18  2018 Q1   23       NA       NA

  1      1/2/18  2018 Q1   22.2     NA       NA

  1      1/3/18  2018 Q1   12.1     2        5

  1      1/4/18  2018 Q1   14.1     5        NA

  3      1/1/18  2018 Q1   101      NA       NA

  3      1/2/18  2018 Q1   55       2        18

 ...      ...      ...     ...     ...      ...

Sometimes - a customer might have reference to another customer - hence the link_cust_id column might be populated with the ID of another customer.
My Goal is: If the Action column == 2, I want to include that customer (from Link_Cust_ID) in my Top_Customer table from the date the link_cust_id is populated to the END OF THE QUARTER

For example, I would include Cust_ID = 5 from 1/3/18 to 3/31/18 (end of the quarter) from the above chart

I've been trying at this for a while now and have come up with the following code (which currently doesn't work, but I think the idea is there)

Linking_ID <- function(data)

{

if (data$link_type == 2)

{

temp.linkid <- data$link_cust_id[i] #stores the linked customer_id

temp.date   <- data$Date[i] #stores the date linking occurs

temp.data   <- customer_data %>% group_by(Quarter) %>% filter(customer_id = temp.linkid & Date >= temp.date)

#the above line of code is suppose to subset only link_customer_id data from the link_date to the end of the quarter

data <- rbind(data, temp.data)

}}

I am not great with loops, and try and not use them much in my code, but I might not have a choice in this situation. If you think another way might be better, please do suggest it!

dput for base table (all customers)

    structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5), 

               Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

                                  1514764800, 1514851200, 1514937600, 1515024000, 

                                  1514764800, 1514851200, 1514937600, 1515024000, 

                                  1514764800, 1514851200, 1514937600, 1515024000), 

                                class = c("POSIXct", "POSIXt"), tzone = "UTC"),

               Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                           "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                           "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                           "2018 Q1"), 

               Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56, 

                         55, NA, NA, 10.5, 11.1), 

               Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA), 

               Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)), 

          row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))

dput for top_customer tble:

structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date = 

structure(c(1514764800, 

1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600, 

1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 

Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1, 

14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2, 

NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA, 

-8L), class = c("tbl_df", "tbl", "data.frame"))

Hi @Mike, see the dput for the base table.. ill add the dput for the top_customer list soon — 10 hours ago

score 1 · Accepted Answer · 2018-11-12 16:56:37Z

You can do the following:

First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.

library(dplyr) 



link_ids_action2 <- customer %>% 

  select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>% 

  filter(Action == 2) %>% 

  mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31", 

                                       grepl("Q2", Quarter) ~ "2018-06-30", 

                                       grepl("Q3", Quarter) ~ "2018-09-30", 

                                       grepl("Q4", Quarter) ~ "2018-12-31"), 

         Quarter_end_dates = as.Date(Quarter_end_dates)) %>% 

  select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)



link_ids_action2

# A tibble: 2 x 4

#   Link_Cust_ID New_Cust_ID Start_date          Quarter_end_dates

#          <dbl>       <dbl> <dttm>              <date>           

# 1            5           1 2018-01-03 00:00:00 2018-03-31       

# 2           18           3 2018-01-02 00:00:00 2018-03-31

Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates to filter properly on your datetime. Select only those columns you need for your top_customer data.

new_top_customers <- 

  right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>% 

  filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>% 

  select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)



new_top_customers

# A tibble: 2 x 6

#   Cust_ID Date                Quarter Sales Action Link_Cust_ID

#     <dbl> <dttm>              <chr>   <dbl>  <dbl>        <dbl>

# 1       5 2018-01-03 00:00:00 2018 Q1  10.5     NA           NA

# 2       5 2018-01-04 00:00:00 2018 Q1  11.1     NA           NA

No with bind_rows you can add the new top customers:

bind_rows(top_customer, new_top_customers)



# A tibble: 10 x 6

#    Cust_ID Date                Quarter Sales Action Link_Cust_ID

#      <dbl> <dttm>              <chr>   <dbl>  <dbl>        <dbl>

#  1       1 2018-01-01 00:00:00 2018 Q1  23       NA           NA

#  2       1 2018-01-02 00:00:00 2018 Q1  22.2     NA           NA

#  3       1 2018-01-03 00:00:00 2018 Q1  12.1      2            5

#  4       1 2018-01-04 00:00:00 2018 Q1  14.1     NA           NA

#  5       3 2018-01-01 00:00:00 2018 Q1 101       NA           NA

#  6       3 2018-01-02 00:00:00 2018 Q1  55        2           18

#  7       3 2018-01-03 00:00:00 2018 Q1  56       NA           NA

#  8       3 2018-01-04 00:00:00 2018 Q1  55       NA           NA

#  9       5 2018-01-03 00:00:00 2018 Q1  10.5     NA           NA

# 10       5 2018-01-04 00:00:00 2018 Q1  11.1     NA           NA

Data

customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5), 

                           Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000), 

                                            class = c("POSIXct", "POSIXt"), tzone = "UTC"),

                           Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1"), 

                           Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56, 

                                     55, NA, NA, 10.5, 11.1), 

                           Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA), 

                           Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)), 

                      row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))





top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), 

                               Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

                                                  1514764800, 1514851200, 1514937600, 1515024000), 

                                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 

                               Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                           "2018 Q1", "2018 Q1", "2018 Q1"), 

                               Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55), 

                               Action = c(NA, NA, 2, NA, NA, 2, NA, NA), 

                               Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), 

                          row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help! — 7 hours ago

score 0 · Answer 2 · 2018-11-12 17:33:41Z

I think this might help. You don't need to do loops here.

all_cust <-  structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5), 

Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000), 

class = c("POSIXct", "POSIXt"), tzone = "UTC"),

Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1"), 

Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56, 

55, NA, NA, 10.5, 11.1), 

Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA), 

Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)), 

row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))



top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date = 

structure(c(1514764800, 

1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600, 

1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 

Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1, 

14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2, 

NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA, 

-8L), class = c("tbl_df", "tbl", "data.frame"))





library(dplyr)

#get where action is equal to 2

top_cust2 <- filter(top_cust, Action == 2)

#join on cust_id to link_cust_id

#Then filter where date is greater or equal too 

#reference date and in the same quarter

   bth <- inner_join(all_cust,top_cust2, 

                  by =c("Cust_ID"="Link_Cust_ID")) %>% 

        filter(Date.x >= Date.y, Quarter.x == Quarter.y) 

#remove .y columns from all_cust

bth <- bth[,!grepl(".y",colnames(bth))]

#drop .x from variable names

colnames(bth) <- gsub(".x","",colnames(bth))



cmb <- bind_rows(top_cust, bth)

score 1 · Accepted Answer · 2018-11-12 16:56:37Z

You can do the following:

First, extract those customers that have a linking customer ID with Action 2 and set the Quarter end dates.

library(dplyr) 



link_ids_action2 <- customer %>% 

  select(Cust_ID, Action, Link_Cust_ID, Date, Quarter) %>% 

  filter(Action == 2) %>% 

  mutate(Quarter_end_dates = case_when(grepl("Q1", Quarter) ~ "2018-03-31", 

                                       grepl("Q2", Quarter) ~ "2018-06-30", 

                                       grepl("Q3", Quarter) ~ "2018-09-30", 

                                       grepl("Q4", Quarter) ~ "2018-12-31"), 

         Quarter_end_dates = as.Date(Quarter_end_dates)) %>% 

  select(Link_Cust_ID, New_Cust_ID = Cust_ID, Start_date = Date, Quarter_end_dates)



link_ids_action2

# A tibble: 2 x 4

#   Link_Cust_ID New_Cust_ID Start_date          Quarter_end_dates

#          <dbl>       <dbl> <dttm>              <date>           

# 1            5           1 2018-01-03 00:00:00 2018-03-31       

# 2           18           3 2018-01-02 00:00:00 2018-03-31

Right join on the original data frame, which gives you only those which are the linked customers and filter by the dates. I had to add as.Dates to filter properly on your datetime. Select only those columns you need for your top_customer data.

new_top_customers <- 

  right_join(customer, link_ids_action2, by = c("Cust_ID" = "Link_Cust_ID")) %>% 

  filter(as.Date(Date) >= as.Date(Start_date), as.Date(Date) <= Quarter_end_dates) %>% 

  select(Cust_ID, Date, Quarter, Sales, Action, Link_Cust_ID)



new_top_customers

# A tibble: 2 x 6

#   Cust_ID Date                Quarter Sales Action Link_Cust_ID

#     <dbl> <dttm>              <chr>   <dbl>  <dbl>        <dbl>

# 1       5 2018-01-03 00:00:00 2018 Q1  10.5     NA           NA

# 2       5 2018-01-04 00:00:00 2018 Q1  11.1     NA           NA

No with bind_rows you can add the new top customers:

bind_rows(top_customer, new_top_customers)



# A tibble: 10 x 6

#    Cust_ID Date                Quarter Sales Action Link_Cust_ID

#      <dbl> <dttm>              <chr>   <dbl>  <dbl>        <dbl>

#  1       1 2018-01-01 00:00:00 2018 Q1  23       NA           NA

#  2       1 2018-01-02 00:00:00 2018 Q1  22.2     NA           NA

#  3       1 2018-01-03 00:00:00 2018 Q1  12.1      2            5

#  4       1 2018-01-04 00:00:00 2018 Q1  14.1     NA           NA

#  5       3 2018-01-01 00:00:00 2018 Q1 101       NA           NA

#  6       3 2018-01-02 00:00:00 2018 Q1  55        2           18

#  7       3 2018-01-03 00:00:00 2018 Q1  56       NA           NA

#  8       3 2018-01-04 00:00:00 2018 Q1  55       NA           NA

#  9       5 2018-01-03 00:00:00 2018 Q1  10.5     NA           NA

# 10       5 2018-01-04 00:00:00 2018 Q1  11.1     NA           NA

Data

customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5), 

                           Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000, 

                                              1514764800, 1514851200, 1514937600, 1515024000), 

                                            class = c("POSIXct", "POSIXt"), tzone = "UTC"),

                           Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                       "2018 Q1"), 

                           Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56, 

                                     55, NA, NA, 10.5, 11.1), 

                           Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA), 

                           Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)), 

                      row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))





top_customer <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), 

                               Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

                                                  1514764800, 1514851200, 1514937600, 1515024000), 

                                                class = c("POSIXct", "POSIXt"), tzone = "UTC"), 

                               Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

                                           "2018 Q1", "2018 Q1", "2018 Q1"), 

                               Sales = c(23, 22.2, 12.1, 14.1, 101, 55, 56, 55), 

                               Action = c(NA, NA, 2, NA, NA, 2, NA, NA), 

                               Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), 

                          row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"))

Hi @kath, thank you very much. I am working with your solution, and I think with a bit more effort on my part - I can use your sample solution as a template ... I really appreciate the help! — 7 hours ago

score 0 · Answer 4 · 2018-11-12 17:33:41Z

I think this might help. You don't need to do loops here.

all_cust <-  structure(list(Cust_ID = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 5, 5, 5, 5), 

Date = structure(c(1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000, 

1514764800, 1514851200, 1514937600, 1515024000), 

class = c("POSIXct", "POSIXt"), tzone = "UTC"),

Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1"), 

Sales = c(23, 22.2, 12.1, 14.1, 18, 18.8, 19.2, 19.8, 101, 55, 56, 

55, NA, NA, 10.5, 11.1), 

Action = c(NA, NA, 2, NA, NA, NA, NA, NA, NA, 2, NA, NA, NA, NA, NA, NA), 

Link_Cust_ID = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 18, NA, NA, NA, NA, NA, NA)), 

row.names = c(NA, -16L), class = c("tbl_df", "tbl", "data.frame"))



top_cust <- structure(list(Cust_ID = c(1, 1, 1, 1, 3, 3, 3, 3), Date = 

structure(c(1514764800, 

1514851200, 1514937600, 1515024000, 1514764800, 1514851200, 1514937600, 

1515024000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 

Quarter = c("2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", "2018 Q1", 

"2018 Q1", "2018 Q1", "2018 Q1"), Sales = c(23, 22.2, 12.1, 

14.1, 101, 55, 56, 55), Action = c(NA, NA, 2, NA, NA, 2, 

NA, NA), Link_Cust_ID = c(NA, NA, 5, NA, NA, 18, NA, NA)), row.names = c(NA, 

-8L), class = c("tbl_df", "tbl", "data.frame"))





library(dplyr)

#get where action is equal to 2

top_cust2 <- filter(top_cust, Action == 2)

#join on cust_id to link_cust_id

#Then filter where date is greater or equal too 

#reference date and in the same quarter

   bth <- inner_join(all_cust,top_cust2, 

                  by =c("Cust_ID"="Link_Cust_ID")) %>% 

        filter(Date.x >= Date.y, Quarter.x == Quarter.y) 

#remove .y columns from all_cust

bth <- bth[,!grepl(".y",colnames(bth))]

#drop .x from variable names

colnames(bth) <- gsub(".x","",colnames(bth))



cmb <- bind_rows(top_cust, bth)

搜尋此網誌

Cfrgtkky

Using Loops to pull in data for a certain range of dates depending on multiple column values

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

Using Loops to pull in data for a certain range of dates depending on multiple column values

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Biblatex bibliography style without URLs when DOI exists (in Overleaf with Zotero bibliography)

ComboBox Display Member on multiple fields

Is it possible to collect Nectar points via Trainline?

2 Answers
2

2 Answers
2

2 Answers
2