Wednesday, October 12, 2011

Tricks I learned today #1: as.integer() on factor levels

I normally work with full numerical data, not categorical data. R, when using read.csv() seems to recognize such categories and marks the column as to have factor levels. This is useful indeed. However, I wanted to make a PCA biplot on this data, so was looking for ways to convert this to class numbers. After some googling we, Anna and me, ran into as.integer() which can be used on the factor levels. So, today I learned this trick:

> a = as.factor(c("A", "B", "A", "C"))
> b = as.integer(factor(a))

Well, probably basic to many, it was new to me :)

Now, wondering if it is equally easy to convert it into a multi-column matrix where each column indicates class membership (thus, resulting in three columns for the above...). That's another trick I need to learn...


  1. a = as.factor(c("A", "B", "A", "C"))
    m <-'rbind', sapply(as.numeric(a), function(x,l) {
    ret <- rep(FALSE, l)
    ret[x] <- TRUE
    }, l=length(levels(a)), simplify=FALSE))
    dimnames(m)[[2]] <- levels(a)

  2. a = as.factor(c("A", "B", "A", "C"))
    m <- t(sapply(a, function(x) { as.integer(x == levels(x)) }))
    colnames(m) <- levels(a)

  3. Rajarshi, Anonymous, both work lovely! Thanx!

  4. Back to numbers/factors..To disable automatic conversion to factors at startup, add this line to your or .Rprofile:

    options(stringsAsFactors = FALSE)

  5. one more...

    a = as.factor(c("A", "B", "A", "C"))
    b = as.integer(factor(a))
    library( kohonen )
    classvec2classmat( b )

  6. Paul, ah nice! How interesting to use the kohonen package here!

    How could I have forgotten that method! I used it 5 years ago :) Hahahaha!