How to calculate regression residuals in R for each individual in a longitudinal analysis?
I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).
Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):
lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)
Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).
Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):
#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}
df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)
Data extract in long format (where Person_ID
is the person, nYear
is time, Weekly_Gross_Pay_Main_Job
is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc
is the group-mean centered version):
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L,
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")
r regression longitudinal multilevel-analysis
add a comment |
I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).
Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):
lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)
Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).
Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):
#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}
df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)
Data extract in long format (where Person_ID
is the person, nYear
is time, Weekly_Gross_Pay_Main_Job
is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc
is the group-mean centered version):
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L,
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")
r regression longitudinal multilevel-analysis
add a comment |
I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).
Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):
lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)
Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).
Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):
#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}
df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)
Data extract in long format (where Person_ID
is the person, nYear
is time, Weekly_Gross_Pay_Main_Job
is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc
is the group-mean centered version):
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L,
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")
r regression longitudinal multilevel-analysis
I am working on a longitudinal/repeated measures multilevel model (MLM). Usually, for time-varying covariates (in my case "weekly gross income/1000"), you would calculate a person-mean centered version of the variable (i.e. deducting the person-year income response from the average of the person's weekly income across all of said person's time points). However, this can lead to bias (see here) and hence a better (more generalisable) approach is to center around a regression line for each individual (as it happens, the residuals from the regression serve this purpose).
Therefore, I need to calculate the following regression, but for each individual (roughly 10,000 individuals with 25,000 observations):
lm(Weekly_Gross_Pay_Main_Job~nYear, data=df)
Then, the really critical part is that I need to extract the residuals to a separate column in my main dataset, matched up with each person. These residuals will take the place of my group-mean centered variable (which will in turn be used in my MLM).
Here is a possible starting point using the function that I have for the group-mean centering. If this could be updated to fit a regression with the residuals output for each person, then that would be ideal (if not, then I am open to other approaches):
#Group mean-centering a variable. Relevant for L1 variables only.
gmc = function(variable, group){
return(ave(variable, group, FUN = function(x){x - mean(x)}))
}
df$Weekly_Gross_Pay_Main_Jobgmc <- gmc(df$Weekly_Gross_Pay_Main_Job, df$Person_ID)
Data extract in long format (where Person_ID
is the person, nYear
is time, Weekly_Gross_Pay_Main_Job
is weekly income/1000 and Weekly_Gross_Pay_Main_Jobgmc
is the group-mean centered version):
structure(list(Person_ID = c(100003L, 100003L, 100003L, 100006L,
100006L, 100006L, 100006L, 100010L, 100010L, 100010L, 100010L,
100010L, 100010L, 100011L, 100014L, 100014L, 100014L, 100014L,
100014L, 100016L, 100018L, 100018L, 100018L, 100018L, 100018L,
100018L, 100018L, 100018L, 100018L, 100020L, 100020L, 100020L,
100020L, 100020L, 100020L, 100020L, 100020L, 100020L, 100021L,
100021L, 100024L, 100024L, 100024L, 100024L, 100024L, 100024L,
100024L, 100024L, 100024L, 100024L, 100025L, 100025L, 100025L,
100025L, 100025L, 100025L, 100025L, 100025L, 100027L, 100027L,
100027L, 100027L, 100029L, 100029L, 100029L, 100029L, 100029L,
100031L, 100031L, 100031L, 100032L, 100032L, 100032L, 100033L,
100033L, 100033L, 100033L, 100033L, 100033L, 100034L, 100034L,
100034L, 100037L, 100037L, 100037L, 100037L, 100037L, 100037L,
100037L, 100044L, 100044L, 100044L, 100044L, 100044L, 100044L,
100044L, 100045L, 100045L, 100045L, 100045L), nYear = c(5L, 6L,
7L, 2L, 3L, 4L, 6L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 5L, 6L, 7L,
8L, 9L, 5L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 1L, 2L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
13L, 14L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 4L, 5L, 6L, 1L, 2L, 3L, 3L, 4L, 5L, 6L, 7L, 8L,
2L, 3L, 5L, 5L, 6L, 7L, 8L, 9L, 11L, 13L, 2L, 3L, 4L, 6L, 7L,
8L, 9L, 4L, 5L, 6L, 7L), Weekly_Gross_Pay_Main_Job = c(0, 0.58,
0.35, 0.035, 0.65, 0.195, 0.43, 0, 0, 0, 0, 0, 0, 0.12, 1.653,
0.967, 1.742, 1.323, 0, 0.709, 0.155, 0.431, 0.235, 0.17, 0.285,
0.357, 0.28, 0.335, 0.375, 0.111, 0.333, 0.582, 0.882, 0.85,
0.944, 1.615, 1.615, 1.35, 0.168, 0.08, 0, 0, 0, 0, 0, 0, 0,
0.134, 0.737, 0, 0.02, 0.372, 0.1, 0.014, 0.307, 0.39, 0.671,
0.5, 0.278, 0.32, 0.425, 0.4, 0.57, 0.917, 0.75, 0.402, 0.437,
0.211, 0.537, 0.54, 0.135, 0.15, 0.65, 0.324, 0.399, 0.497, 0.67,
0.825, 0.825, 0.25, 0.319, 0.35, 0.885, 0.941, 0.975, 0.975,
1.02, 1.096, 1.148, 0.1, 0.11, 0.413, 0.477, 0.578, 0.686, 0.686,
0.511, 0.578, 0.8, 0.75), Weekly_Gross_Pay_Main_Jobgmc = c(-0.31,
0.27, 0.04, -0.2925, 0.3225, -0.1325, 0.1025, 0, 0, 0, 0, 0,
0, 0, 0.516, -0.17, 0.605, 0.186, -1.137, 0, -0.136444444444444,
0.139555555555556, -0.0564444444444445, -0.121444444444444, -0.00644444444444447,
0.0655555555555555, -0.0114444444444444, 0.0435555555555556,
0.0835555555555555, -0.809222222222222, -0.587222222222222, -0.338222222222222,
-0.0382222222222223, -0.0702222222222223, 0.0237777777777777,
0.694777777777778, 0.694777777777778, 0.429777777777778, 0.044,
-0.044, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871, -0.0871,
-0.0871, 0.0469, 0.6499, -0.0871, -0.27675, 0.07525, -0.19675,
-0.28275, 0.01025, 0.09325, 0.37425, 0.20325, -0.07775, -0.03575,
0.06925, 0.04425, -0.0452, 0.3018, 0.1348, -0.2132, -0.1782,
-0.218333333333333, 0.107666666666667, 0.110666666666667, -0.176666666666667,
-0.161666666666667, 0.338333333333333, -0.266, -0.191, -0.093,
0.0800000000000001, 0.235, 0.235, -0.0563333333333333, 0.0126666666666667,
0.0436666666666666, -0.120714285714286, -0.0647142857142858,
-0.0307142857142858, -0.0307142857142858, 0.0142857142857142,
0.0902857142857143, 0.142285714285714, -0.335714285714286, -0.325714285714286,
-0.0227142857142857, 0.0412857142857143, 0.142285714285714, 0.250285714285714,
0.250285714285714, -0.1368, -0.0698000000000001, 0.1522, 0.1022
)), row.names = c(NA, 100L), class = "data.frame")
r regression longitudinal multilevel-analysis
r regression longitudinal multilevel-analysis
asked Nov 20 '18 at 3:16
aspark2020aspark2020
185
185
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around
some.model<-lme(DV~IV, random=~1|Id, data=df)
head(residuals(some.model))
7 7 24 24 32 32
-0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424
If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.
extra.column<-residuals(some.model)
extra.column.id<-names(residuals(some.model))
extra.column<-residuals(some.model)
cbind(extra.column,extra.column.id)
extra.column extra.column.id
7 "-0.0541358252373243" "7"
7 "-0.0541358252373243" "7"
24 "0.0642716380035857" "24"
24 "0.0642716380035857" "24"
32 "-0.0019754241828096" "32"
32 "-0.0019754241828096" "32"
Sorry if this is not what you're looking for, but check out the residuals command.
add a comment |
Here is how I ended up doing it:
#Before you begin, time needs to be grand-mean centered.
df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)
#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.
#First, create a group called `by_person`.
df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
by_Person <- dplyr::group_by(df, Person_ID)
#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")
#Third, copy over the required columns (renaming them would be more efficient, but either way).
df$RegResGrossPay <- df$.resid
#Fourth, do an optional tidy up.
colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
df$Person_ID.y <- NULL
df$nYearmc.y <- NULL
df$Weekly_Gross_Pay_Main_Job.y <- NULL
df$.fitted <- NULL
df$.se.fit <- NULL
df$.resid <- NULL
df$.hat <- NULL
df$.sigma <- NULL
df$.cooksd <- NULL
df$.std.resid <- NULL
df.Weekly_Gross_Pay_Main_Job <- NULL
#Fifth, generate plots of the variables you need.
ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around
some.model<-lme(DV~IV, random=~1|Id, data=df)
head(residuals(some.model))
7 7 24 24 32 32
-0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424
If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.
extra.column<-residuals(some.model)
extra.column.id<-names(residuals(some.model))
extra.column<-residuals(some.model)
cbind(extra.column,extra.column.id)
extra.column extra.column.id
7 "-0.0541358252373243" "7"
7 "-0.0541358252373243" "7"
24 "0.0642716380035857" "24"
24 "0.0642716380035857" "24"
32 "-0.0019754241828096" "32"
32 "-0.0019754241828096" "32"
Sorry if this is not what you're looking for, but check out the residuals command.
add a comment |
not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around
some.model<-lme(DV~IV, random=~1|Id, data=df)
head(residuals(some.model))
7 7 24 24 32 32
-0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424
If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.
extra.column<-residuals(some.model)
extra.column.id<-names(residuals(some.model))
extra.column<-residuals(some.model)
cbind(extra.column,extra.column.id)
extra.column extra.column.id
7 "-0.0541358252373243" "7"
7 "-0.0541358252373243" "7"
24 "0.0642716380035857" "24"
24 "0.0642716380035857" "24"
32 "-0.0019754241828096" "32"
32 "-0.0019754241828096" "32"
Sorry if this is not what you're looking for, but check out the residuals command.
add a comment |
not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around
some.model<-lme(DV~IV, random=~1|Id, data=df)
head(residuals(some.model))
7 7 24 24 32 32
-0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424
If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.
extra.column<-residuals(some.model)
extra.column.id<-names(residuals(some.model))
extra.column<-residuals(some.model)
cbind(extra.column,extra.column.id)
extra.column extra.column.id
7 "-0.0541358252373243" "7"
7 "-0.0541358252373243" "7"
24 "0.0642716380035857" "24"
24 "0.0642716380035857" "24"
32 "-0.0019754241828096" "32"
32 "-0.0019754241828096" "32"
Sorry if this is not what you're looking for, but check out the residuals command.
not sure if I'm reading you right, this might be a very naive answer missing the point, but doesn't "residuals" just work.
Here's a linear mixed effects model with some data i had lying around
some.model<-lme(DV~IV, random=~1|Id, data=df)
head(residuals(some.model))
7 7 24 24 32 32
-0.054135825 -0.054135825 0.064271638 0.064271638 -0.001975424 -0.001975424
If you really want to put it into a column with the idnumber next to it it takes a few more steps. It probably can be done in a single step but i'm really bad.
extra.column<-residuals(some.model)
extra.column.id<-names(residuals(some.model))
extra.column<-residuals(some.model)
cbind(extra.column,extra.column.id)
extra.column extra.column.id
7 "-0.0541358252373243" "7"
7 "-0.0541358252373243" "7"
24 "0.0642716380035857" "24"
24 "0.0642716380035857" "24"
32 "-0.0019754241828096" "32"
32 "-0.0019754241828096" "32"
Sorry if this is not what you're looking for, but check out the residuals command.
answered Nov 22 '18 at 9:39
Huy PhamHuy Pham
1315
1315
add a comment |
add a comment |
Here is how I ended up doing it:
#Before you begin, time needs to be grand-mean centered.
df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)
#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.
#First, create a group called `by_person`.
df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
by_Person <- dplyr::group_by(df, Person_ID)
#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")
#Third, copy over the required columns (renaming them would be more efficient, but either way).
df$RegResGrossPay <- df$.resid
#Fourth, do an optional tidy up.
colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
df$Person_ID.y <- NULL
df$nYearmc.y <- NULL
df$Weekly_Gross_Pay_Main_Job.y <- NULL
df$.fitted <- NULL
df$.se.fit <- NULL
df$.resid <- NULL
df$.hat <- NULL
df$.sigma <- NULL
df$.cooksd <- NULL
df$.std.resid <- NULL
df.Weekly_Gross_Pay_Main_Job <- NULL
#Fifth, generate plots of the variables you need.
ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)
add a comment |
Here is how I ended up doing it:
#Before you begin, time needs to be grand-mean centered.
df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)
#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.
#First, create a group called `by_person`.
df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
by_Person <- dplyr::group_by(df, Person_ID)
#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")
#Third, copy over the required columns (renaming them would be more efficient, but either way).
df$RegResGrossPay <- df$.resid
#Fourth, do an optional tidy up.
colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
df$Person_ID.y <- NULL
df$nYearmc.y <- NULL
df$Weekly_Gross_Pay_Main_Job.y <- NULL
df$.fitted <- NULL
df$.se.fit <- NULL
df$.resid <- NULL
df$.hat <- NULL
df$.sigma <- NULL
df$.cooksd <- NULL
df$.std.resid <- NULL
df.Weekly_Gross_Pay_Main_Job <- NULL
#Fifth, generate plots of the variables you need.
ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)
add a comment |
Here is how I ended up doing it:
#Before you begin, time needs to be grand-mean centered.
df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)
#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.
#First, create a group called `by_person`.
df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
by_Person <- dplyr::group_by(df, Person_ID)
#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")
#Third, copy over the required columns (renaming them would be more efficient, but either way).
df$RegResGrossPay <- df$.resid
#Fourth, do an optional tidy up.
colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
df$Person_ID.y <- NULL
df$nYearmc.y <- NULL
df$Weekly_Gross_Pay_Main_Job.y <- NULL
df$.fitted <- NULL
df$.se.fit <- NULL
df$.resid <- NULL
df$.hat <- NULL
df$.sigma <- NULL
df$.cooksd <- NULL
df$.std.resid <- NULL
df.Weekly_Gross_Pay_Main_Job <- NULL
#Fifth, generate plots of the variables you need.
ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)
Here is how I ended up doing it:
#Before you begin, time needs to be grand-mean centered.
df$nYearmc <- df$nYear-mean(df$nYear, na.rm=TRUE)
#Now to regress the time-varying covariate onto grand-mean centered time and complete the process.
#First, create a group called `by_person`.
df <- tidyr::unite(df, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
by_Person <- dplyr::group_by(df, Person_ID)
#Second, regress the time-varying covariate onto the newly created grand-mean centered time variable and merge with the main data frame.
df.Weekly_Gross_Pay_Main_Job <- dplyr::do(by_Person, augment(lm(Weekly_Gross_Pay_Main_Job~nYearmc, data=.)))
df.Weekly_Gross_Pay_Main_Job <- tidyr::unite(df.Weekly_Gross_Pay_Main_Job, Person_Year, c(Person_ID, nYearmc), remove=FALSE)
df <- merge(df, df.Weekly_Gross_Pay_Main_Job, by="Person_Year")
#Third, copy over the required columns (renaming them would be more efficient, but either way).
df$RegResGrossPay <- df$.resid
#Fourth, do an optional tidy up.
colnames(df)[colnames(df)=="Person_ID.x"] <- "Person_ID"
colnames(df)[colnames(df)=="nYearmc.x"] <- "nYearmc"
colnames(df)[colnames(df)=="Weekly_Gross_Pay_Main_Job.x"] <- "Weekly_Gross_Pay_Main_Job"
df$Person_ID.y <- NULL
df$nYearmc.y <- NULL
df$Weekly_Gross_Pay_Main_Job.y <- NULL
df$.fitted <- NULL
df$.se.fit <- NULL
df$.resid <- NULL
df$.hat <- NULL
df$.sigma <- NULL
df$.cooksd <- NULL
df$.std.resid <- NULL
df.Weekly_Gross_Pay_Main_Job <- NULL
#Fifth, generate plots of the variables you need.
ggplot(df, aes(nYearmc, RegResGrossPay))+geom_line(aes(group=Person_ID), alpha =1/3)+geom_smooth(method="lm",se=FALSE)
answered Nov 27 '18 at 6:17
aspark2020aspark2020
185
185
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53385696%2fhow-to-calculate-regression-residuals-in-r-for-each-individual-in-a-longitudinal%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown