Your Location is: Home > Mysql

Is there an SQL command to count frequencies of a value in different columns?

From: Australien View: 2009 Kevin Lucas 

Question

I have a very large dataset of donations to educational projects. I have done some processing and for this question there are three tables of interest: Project, Funding and Category.

  • Project contains the project ID, some other negligible info (e.g. date started), and the category ID the project belongs to. Projects can belong to one or two categories, and so there are two columns for each project. If a project only belongs to one category, category 2 is NULL. In total there are 8 categories, with category ID's going from 1 to 8.
  • Funding contains the project ID, some other negligible info (e.g. total cost), and the current status of the project. This is either 'fully funded' or 'expired', as all projects are done.
  • Category only contains 2 columns, one with the 8 category ID's and the other with the category names (1 - Sports, 2 - Science, etc).
*Project*
project_id  category_id1 category_id2
...             ...         ...
...             ...         ...

*Funding*
project_id  status
...          ...
...          ...

*Category*
Category_ID   project_category
...                   ...
...                   ...

I'm now trying to find out for each category the percentage of those fully funded, which would be (fully funded) / (fully funded + expired). However, I can't seem to find a way to make SQL count instances for each category regardless of whether they are in category column 1 or category column 2 of 'Project' table. This is the code I have so far with its output:

SELECT project_category, status, count(project_category)
FROM Project
INNER JOIN Category ON Project.Category_ID1 = Category.Category_ID
INNER JOIN Funding ON Project.project_id = Funding.project_id
GROUP BY project_category, status 
project_category    status    count(project_category)
Applied Learning    Expired        4003
Applied Learning  Fully Funded     11441
Essentials          Expired           16
Essentials        Fully Funded       219
Health & Sports     Expired         1235
Health & Sports   Fully Funded      4518
...                 ....             ...
...                 ....             ...

This output only counts the categories from project.category_id1. I could just make another table for project.category_id2 and add them up manually, but I would rather have it one table. Is there a way to do this? Thanks for trying to help!!

Best answer

You can unpivot and then aggregate:

SELECT c.project_category, f.status, count(*)
FROM (SELECT p.project_id1 as project_id, p.Category_ID FROM Project p
      UNION ALL
      SELECT p.project_id2 as project_id, p.Category_ID FROM Project p
     ) p JOIN
     Category c
     ON p.Category_ID = c.Category_ID JOIN
     Funding f
     ON p.project_id = f.project_id
GROUP BY c.project_category, f.status;

Note that this also introduces table aliases and qualified all column references.

Here is a db<>fiddle.