- reshape2 패키지는 reshaple 패키지의 성능을 개선한 것으로,
- 열이 긴 형태의 데이터를 행이 긴 형태로 바꾸는 melt() 함수,
- 행이 긴 형태로 바꾸는 cast() 함수를 포함한다.
1. 넓은 모양 데이터를 긴 모양으로 바꾸기 : melt() 함수
melt(데이터, id.var="기준열", measure.var="변환열")
> names(airquality) <- tolower(names(airquality))
> melt_test <- melt(airquality, id.vars = c("month", "wind"),
measure.vars = "ozone")
> head(melt_test)
month wind variable value
1 5 7.4 ozone 41
2 5 8.0 ozone 36
3 5 12.6 ozone 12
4 5 11.5 ozone 18
5 5 14.3 ozone NA
6 5 14.9 ozone 28
2. 긴 모양 데이터를 넓은 모양으로 바꾸기 : cast() 함수
- 하나의 함수지만, 데이터 유형에 따라 사용하는 함수가 2가지로 나눠진다.
- acast() : 데이터를 변형하여 벡터, 행렬, 배열로 반환
- dcast() : 데이터를 변형하여 데이터 프레임 형태로 반환
dcast(데이터, 기준열 ~ 변환열)
> aq_melt <- melt(airquality, id.vars = c("month","day"), na.rm = TRUE)
> head(aq_melt)
month day variable value
1 5 1 ozone 41
2 5 2 ozone 36
3 5 3 ozone 12
4 5 4 ozone 18
6 5 6 ozone 28
7 5 7 ozone 23
> aq_dcast <- dcast(aq_melt, month + day ~ variable) #month, day 열 기준으로 variable열을 변환
> head(aq_dcast)
month day ozone solar.r wind temp
1 5 1 41 190 7.4 67
2 5 2 36 118 8.0 72
3 5 3 12 149 12.6 74
4 5 4 18 313 11.5 62
5 5 5 NA NA 14.3 56
6 5 6 28 NA 14.9 66
acast(데이터, 기준열 ~ 변환열 ~ 분리기준열)
> head(acast(aq_melt, day ~ month ~ variable)) #day열 기준으로 month열을 변환, variable 변수로 배열 만듬.
, , ozone
5 6 7 8 9
1 41 NA 135 39 96
2 36 NA 49 9 78
3 12 NA 32 16 73
4 18 NA NA 78 91
5 NA NA 64 35 47
6 28 NA 40 66 32
, , solar.r
5 6 7 8 9
1 190 286 269 83 167
2 118 287 248 24 197
3 149 242 236 77 183
4 313 186 101 NA 189
5 NA 220 175 NA 95
6 NA 264 314 NA 92
- acast() 함수 이용해 데이터 세트를 배열로 정리하면, 항목별로 한눈에 비교하기 쉽다.
3. cast()함수로 데이터 요약하기
- cast() 함수는 데이터 요약을 할 수 있는 것 특징이다.
> acast(aq_melt, month ~ variable, mean) #기술통계함수명만 씀. 함수명다음 괄호x
ozone solar.r wind temp
5 23.61538 181.2963 11.622581 65.54839
6 29.44444 190.1667 10.266667 79.10000
7 59.11538 216.4839 8.941935 83.90323
8 59.96154 171.8571 8.793548 83.96774
9 31.44828 167.4333 10.180000 76.90000
> dcast(aq_melt, month ~ variable, sum) #데이터 합계
month ozone solar.r wind temp
1 5 614 4895 360.3 2032
2 6 265 5705 308.0 2373
3 7 1537 6711 277.2 2601
4 8 1559 4812 272.6 2603
5 9 912 5023 305.4 2307
> dcast(aq_melt, month ~ variable, length) #데이터 개수
month ozone solar.r wind temp
1 5 26 27 31 31
2 6 9 30 30 30
3 7 26 31 31 31
4 8 26 28 31 31
5 9 29 30 30 30
[참고]
reshape2: a reboot of the reshape package
Reshape2 is a reboot of the reshape package. It's been over five years since the first release of the package, and in that time I've learned a tremendous amount about R programming, and how to work with data in R. Reshape2 uses that knowledge to make a new package for reshaping data that is much more focussed and much much faster.
This version improves speed at the cost of functionality, so I have renamed it to reshape2 to avoid causing problems for existing users. Based on user feedback I may reintroduce some of these features.
What's new in reshape2:
- considerably faster and more memory efficient thanks to a much better underlying algorithm that uses the power and speed of subsetting to the fullest extent, in most cases only making a single copy of the data.
- cast is replaced by two functions depending on the output type: dcast produces data frames, and acast produces matrices/arrays.
- multidimensional margins are now possible: grand_row and grand_col have been dropped: now the name of the margin refers to the variable that has its value set to (all).
- some features have been removed such as the | cast operator, and the ability to return multiple values from an aggregation function. I'm reasonably sure both these operations are better performed by plyr.
- a new cast syntax which allows you to reshape based on functions
of variables (based on the same underlying syntax as plyr): - better development practices like namespaces and tests.
*출처 : https://stat.ethz.ch/pipermail/r-packages/2010/001169.html
'[R] 데이터 가공 함수' 카테고리의 다른 글
[데이터 처리를 위한 R 패키지] data.table, dplyr, plyr, reshape2, sqldf (0) | 2022.04.21 |
---|---|
[stringr 패키지]str_length(), str_c(),str_sub(),str_subset(), str_count(), str_detect(), str_locate(), str_extract(), str_replace(), str_split() (0) | 2022.04.21 |
[reshape 패키지] (0) | 2022.04.20 |
[plyr 패키지] (0) | 2022.04.20 |
[dplyr 패키지] filter,select,arrange,mutate,summarise,n,n_distinct,group_by,sample_n,sample_frac (0) | 2022.04.14 |