CoCA: Concept Class Analysis



Concept Class Analysis (CoCA) is a method for grouping documents based on the schematic similarities in their engagement with multiple semantic directions. This is a generalization of Correlational Class Analysis for survey data. We outline this method in more detail in our Sociological Science paper, “Concept Class Analysis: A Method for Identifying Cultural Schemas in Texts.”

After getting familiar with using CMDist(), the first step to use CoCA is building two or more semantic directions. For example, here are three semantic directions related to socio-economic status. Note that you must load or create word embeddings.

    # build juxtaposed pairs for each semantic directions
    pairs_01 <- data.frame(additions  = c("rich", "richer", "affluence", "wealthy"),
                           substracts = c("poor", "poorer", "poverty", "impoverished") )

    pairs_02 <- data.frame(additions  = c("skilled", "competent", "proficient", "adept"),
                          substracts = c("unskilled", "incompetent", "inproficient", "inept") )
    pairs_03 <- data.frame(additions  = c("educated", "learned", "trained", "literate"),
                           substracts = c("uneducated", "unlearned", "untrained", "illiterate") )
    # get the vectors for each direction
    sd_01 <- get_direction(pairs_01, my_wv)
    sd_02 <- get_direction(pairs_02, my_wv)
    sd_03 <- get_direction(pairs_03, my_wv)

    # row bind each direction
    sem_dirs <- rbind(sd_01, sd_02, sd_03)

Next, we feed our document-term matrix, word embeddings matrix, and our semantic direction data.frame from above to the CoCA function:

  classes <- CoCA(my_dtm, wv = my_wv, directions = sem_dirs)

Finally, using the plot() function, we can generate simple visualizations of the schematic classes found:

  # this is a quick plot. 
  # designate which module to plot with `module = `
  plot(classes, module=1)