Is there an SQL command to count frequencies of a value in different columns?
I have a very large dataset of donations to educational projects. I have done some processing and for this question there are three tables of interest: Project, Funding and Category.
- Project contains the project ID, some other negligible info (e.g. date started), and the category ID the project belongs to. Projects can belong to one or two categories, and so there are two columns for each project. If a project only belongs to one category, category 2 is NULL. In total there are 8 categories, with category ID's going from 1 to 8.
- Funding contains the project ID, some other negligible info (e.g. total cost), and the current status of the project. This is either 'fully funded' or 'expired', as all projects are done.
- Category only contains 2 columns, one with the 8 category ID's and the other with the category names (1 - Sports, 2 - Science, etc).
*Project* project_id category_id1 category_id2 ... ... ... ... ... ... *Funding* project_id status ... ... ... ... *Category* Category_ID project_category ... ... ... ...
I'm now trying to find out for each category the percentage of those fully funded, which would be (fully funded) / (fully funded + expired). However, I can't seem to find a way to make SQL count instances for each category regardless of whether they are in category column 1 or category column 2 of 'Project' table. This is the code I have so far with its output:
SELECT project_category, status, count(project_category) FROM Project INNER JOIN Category ON Project.Category_ID1 = Category.Category_ID INNER JOIN Funding ON Project.project_id = Funding.project_id GROUP BY project_category, status
project_category status count(project_category) Applied Learning Expired 4003 Applied Learning Fully Funded 11441 Essentials Expired 16 Essentials Fully Funded 219 Health & Sports Expired 1235 Health & Sports Fully Funded 4518 ... .... ... ... .... ...
This output only counts the categories from project.category_id1. I could just make another table for project.category_id2 and add them up manually, but I would rather have it one table. Is there a way to do this? Thanks for trying to help!!
You can unpivot and then aggregate:
SELECT c.project_category, f.status, count(*) FROM (SELECT p.project_id1 as project_id, p.Category_ID FROM Project p UNION ALL SELECT p.project_id2 as project_id, p.Category_ID FROM Project p ) p JOIN Category c ON p.Category_ID = c.Category_ID JOIN Funding f ON p.project_id = f.project_id GROUP BY c.project_category, f.status;
Note that this also introduces table aliases and qualified all column references.
Here is a db<>fiddle.