金魚亭日常

読書,ガジェット,競技プログラミング

readxl で 日付と文字列が混在している列を読み込む

readxlExcelファイルを読み込むときに,日付列が文字列と日付の混在だった場合,とりあえず col_typeslist にして読み込む.

d <- read_excel("data.xlsx", 
                skip = 2, 
                col_names = c("Date", "Temp", "RH", "WindSpd", "GustSpd", "Rain", "WindDir"), 
                col_types = c("list", "numeric", "numeric", "numeric", "numeric", "numeric", "numeric"))
head(d)
# A tibble: 6 x 7
  Date        Temp    RH WindSpd GustSpd  Rain WindDir
  <list>     <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 <dttm [1]>  28.4  82.5   0.       0.   NA       1.40
2 <dttm [1]>  28.9  77.5   0.250    3.02  0.     78.6 
3 <dttm [1]>  31.9  61.2   0.500    2.27  0.    135.  
4 <dttm [1]>  25.6  82.6   0.250    4.53  0.    129.  
5 <dttm [1]>  22.8  95.3   0.       1.76  8.60  170.  
6 <dttm [1]>  22.8  96.1   0.       0.    0.    147.  

次に,変換する関数を定義.

形式が mdy HMS p で,日付で読まれた方もコケているので修正する(lubridateを使用)

f_parse_date_column <- function(x){
  if (is.character(x[[1]])) {
    parse_date_time(x[[1]], order = "mdy HMS p")
  } else {
    this_date = x[[1]]
    y = year(this_date)
    m = month(this_date)
    d = day(this_date)
    hr = hour(this_date)
    mn = minute(this_date)
    sc = second(this_date)
    make_datetime(y, d, m, hr, mn, sc)
  }
}

最後に,dplyr::mutate で変換.

dd <- d %>% 
  rowwise() %>% 
  mutate(Date = f_parse_date_column(Date))
head(dd)
# A tibble: 6 x 7
  Date                 Temp    RH WindSpd GustSpd  Rain WindDir
  <dttm>              <dbl> <dbl>   <dbl>   <dbl> <dbl>   <dbl>
1 2015-03-04 14:14:54  28.4  82.5   0.       0.   NA       1.40
2 2015-03-04 15:14:54  28.9  77.5   0.250    3.02  0.     78.6 
3 2015-03-04 16:14:54  31.9  61.2   0.500    2.27  0.    135.  
4 2015-03-04 17:14:54  25.6  82.6   0.250    4.53  0.    129.  
5 2015-03-04 18:14:54  22.8  95.3   0.       1.76  8.60  170.  
6 2015-03-04 19:14:54  22.8  96.1   0.       0.    0.    147.