7 Replies Latest reply: Apr 2, 2013 9:36 PM by Frank Kulash RSS

    Creating a scoring column based on threatened species lists

    996454
      I'm doing scoring for sites based on the number of threatened species at each particular site and the threat status of each of the threatened species.

      A threatened species could either have a threat status of nationally_critical, nationally_endangered, or nationally_vulnerable.

      I am trying to write a query will display a column called TOTAL_SCORE where:

      TOTAL_SCORE = (number of Nationally Critical species * 1) + (number of Nationally Endangered species * 0.5) + (number of Nationally Vulnerable species * 0.25)

      I'm able to do this using excel formulas but I have no idea how to go about writing it in SQL. I would have attached the excel spreadsheet but attachments are not allowed in this forum I think?

      See below for descriptions and DDL for the tables I have:

      I have three tables: SITES, SPECIES_LIST, THREAT_STATUS_SCORES

      The SITES table contains the site number and the species present at each particular site:
      SITE_NUMBER    SPECIES_PRESENT
      1            Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis
      2            Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile
      3            Dichelachne_micrantha, Galaxias_pullus
      4            Spio_aequalis
      5            Pachycladon_exile, Geranium_retrorsum
      6            Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei
      7            Drosera_pygmaea
      The SPECIES_LIST table has a list of nationally critical, nationally endangered, and nationally vulnerable species:
      NATIONALLY_CRITICAL               NATIONALLY_ENDANGERED      NATIONALLY_VULNERABLE
      Olearia_pachyphylla               Oligosoma_judgei          Dichelachne_micrantha
      Ophioglossum_petiolatum               Oligosoma_whitakeri          Drosera_pygmaea
      Ourisia_modesta                       Spio_aequalis                  Geranium_retrorsum
      Pachycladon_exile               Galaxias_pullus          Gratiola_concinna
      The THREAT_STATUS_SCORES table shows the score given to each threat status i.e. each nationally vulnerable species gets a score of 1,
      each nationally endangered species gets a score of 0.5, and each nationally vulnerable gets a score of 0.25:
      THREAT_STATUS           SCORE
      NATIONALLY_CRITICAL     1
      NATIONALLY_ENDANGERED     0.5
      NATIONALLY_VULNERABLE     0.25
      DDL:
      CREATE TABLE sites 
      (site_number varchar2(3) not null,
      species_present varchar2(255),
      CONSTRAINT site_number_pk PRIMARY KEY (site_number))
      /
      
      CREATE TABLE species_list
      (nationally_critical varchar2(50),
      nationally_endangered varchar2(50),
      nationally_vulnerable varchar2(50))
      /
      
      CREATE TABLE threat_status_scores
      (threat_status varchar2(50),
      score number(4,2))
      
      /
      
      INSERT INTO sites VALUES (1, Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis);
      INSERT INTO sites VALUES (2, Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile);
      INSERT INTO sites VALUES (3, Dichelachne_micrantha, Galaxias_pullus);
      INSERT INTO sites VALUES (4, Spio_aequalis);
      INSERT INTO sites VALUES (5, Pachycladon_exile, Geranium_retrorsum);
      INSERT INTO sites VALUES (6, Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei);
      INSERT INTO sites VALUES (7, Drosera_pygmaea);
      
      INSERT INTO species_list VALUES (Olearia_pachyphylla, Oligosoma_judgei, Dichelachne_micrantha);
      INSERT INTO species_list VALUES (Ophioglossum_petiolatum, Oligosoma_whitakeri, Drosera_pygmaea);
      INSERT INTO species_list VALUES (Ourisia_modesta, Spio_aequalis, Geranium_retrorsum);
      INSERT INTO species_list VALUES (Pachycladon_exile, Galaxias_pullus, Gratiola_concinna);
      
      INSERT INTO threat_status_scores VALUES (NATIONALLY_CRITICAL, 1);
      INSERT INTO threat_status_scores VALUES (NATIONALLY_ENDANGERED, 0.5);
      INSERT INTO threat_status_scores VALUES (NATIONALLY_VULNERABLE, 0.25);
      Any help on how I could add a TOTAL_SCORE column to the SITES table would be much appreciated.

      I'm using oracle 11g

      Edited by: 993451 on Apr 1, 2013 4:01 PM

      Edited by: 993451 on Apr 1, 2013 4:02 PM
        • 1. Re: Creating a scoring column based on threatened species lists
          996454
          This is what the TOTAL_SCORE should be:

          SPECIES_PRESENT                                                    TOTAL_SCORE
          Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis            2
          Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile    3
          Dichelachne_micrantha, Galaxias_pullus                            0.75
          Spio_aequalis                                                    0.5
          Pachycladon_exile, Geranium_retrorsum                            1.25
          Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei            2.5
          Drosera_pygmaea                                                    0.25
          I did this in Excel but want to do it in SQL as well.
          • 2. Re: Creating a scoring column based on threatened species lists
            sb92075
            993451 wrote:
            This is what the TOTAL_SCORE should be:

            SPECIES_PRESENT                                                    TOTAL_SCORE
            Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis            2
            Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile    3
            Dichelachne_micrantha, Galaxias_pullus                            0.75
            Spio_aequalis                                                    0.5
            Pachycladon_exile, Geranium_retrorsum                            1.25
            Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei            2.5
            Drosera_pygmaea                                                    0.25
            I did this in Excel but want to do it in SQL as well.
            It is generally a Bad Idea to actually store any computed value in any static table.
            Consider to CREATE VIEW where TOTAL_SCORE is computed when ever it is requested.
            • 3. Re: Creating a scoring column based on threatened species lists
              Frank Kulash
              Hi,

              Here's one way to do that in Oracle 11:
              WITH   normalized_species      AS
              (
                   SELECT     *
                   FROM     species_list
                   UNPIVOT     (   species_name
                        FOR score  IN ( nationally_critical     AS 1
                                       , nationally_endangered     AS 0.5
                                       , nationally_vulnerable     AS 0.25
                                   )
                        )
              )
              SELECT       s.species_present
              ,       SUM (n.score)          AS total_score
              FROM       sites                s
              JOIN       normalized_species  n  ON    ', ' || s.species_present || ','
                                           LIKE '%, ' || n.species_name    || ',%' 
              GROUP BY  s.site_number, s.species_present
              ORDER BY  s.site_number
              ;
              This just displays the results. If you want a view that contains these results, then add "CREATE OR REPLACE view_name AS " at the very beginning. If you want a table, add "CREATE TABLE table_name AS " instead. A table will be a fast way to see the results, but it will be incorrect as soon as the tables are changed.

              Neither sites not species list has a good table design. Relational databases work best when each row represents one individual item. In species_list, each row represents 3 separate items. In sites, each row represents 1 or more separate items. You code will be more efficient, and much easier to write and to maintain, if you change these tables so that they have more rows, and each row represents only 1 individual.

              Thanks for posting the CREATE TABLE and INSERT statements. Please test (and, if necessary, correct) them before you post them. None of the INSERT statements work as posted. String literals in Oracle have to be enclosed in single-quotes, like this:
              INSERT INTO species_list (nationally_critical,    nationally_endangered,  nationally_vulnerable)
                     VALUES            ('Olearia_pachyphylla',  'Oligosoma_judgei',        'Dichelachne_micrantha');
              • 4. Re: Creating a scoring column based on threatened species lists
                996454
                Thanks Frank, that's awesome!

                Sorry about the incorrect INSERT statements I will be sure to test them in future.

                Frank Kulash wrote:
                Neither sites not species list has a good table design. Relational databases work best when each row represents one individual item. In species_list, each row represents 3 separate items. In sites, each row represents 1 or more separate items. You code will be more efficient, and much easier to write and to maintain, if you change these tables so that they have more rows, and each row represents only 1 individual.
                I see. So it would be better to split the species_list table into three separate table i.e. nationally_critical_species, nationally_endangered_species, nationally_vulnerable_species:

                I have changed the DDL to:
                CREATE TABLE sites 
                (site_number varchar2(3) not null,
                species_present varchar2(255),
                CONSTRAINT site_number_pk PRIMARY KEY (site_number))
                /
                
                CREATE TABLE threat_status_scores
                (threat_status varchar2(50),
                score number(4,2))
                /
                
                CREATE TABLE nationally_critical_species
                (nationally_critical VARCHAR2(50))
                /
                
                CREATE TABLE nationally_endangered_species
                (nationally_endangered VARCHAR2(50))
                /
                
                CREATE TABLE nationally_vulnerable_species
                (nationally_vulnerable VARCHAR2(50))
                /
                
                INSERT INTO sites VALUES (1, 'Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis');
                INSERT INTO sites VALUES (2, 'Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile');
                INSERT INTO sites VALUES (3, 'Dichelachne_micrantha, Galaxias_pullus');
                INSERT INTO sites VALUES (4, 'Spio_aequalis');
                INSERT INTO sites VALUES (5, 'Pachycladon_exile, Geranium_retrorsum');
                INSERT INTO sites VALUES (6, 'Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei');
                INSERT INTO sites VALUES (7, 'Drosera_pygmaea');
                
                INSERT INTO threat_status_scores VALUES ('NATIONALLY_CRITICAL', 1);
                INSERT INTO threat_status_scores VALUES ('NATIONALLY_ENDANGERED', 0.5);
                INSERT INTO threat_status_scores VALUES ('NATIONALLY_VULNERABLE', 0.25);
                
                INSERT INTO nationally_critical_species VALUES ('Olearia_pachyphylla');
                INSERT INTO nationally_critical_species VALUES ('Ophioglossum_petiolatum');
                INSERT INTO nationally_critical_species VALUES ('Ourisia_modesta');
                INSERT INTO nationally_critical_species VALUES ('Pachycladon_exile');
                
                INSERT INTO nationally_endangered_species VALUES ('Oligosoma_judgei');
                INSERT INTO nationally_endangered_species VALUES ('Oligosoma_whitakeri');
                INSERT INTO nationally_endangered_species VALUES ('Spio_aequalis');
                INSERT INTO nationally_endangered_species VALUES ('Galaxias_pullus');
                
                INSERT INTO nationally_vulnerable_species VALUES ('Dichelachne_micrantha');
                INSERT INTO nationally_vulnerable_species VALUES ('Drosera_pygmaea');
                INSERT INTO nationally_vulnerable_species VALUES ('Geranium_retrorsum');
                INSERT INTO nationally_vulnerable_species VALUES ('Gratiola_concinna');
                However, I'm not sure how to change the table design of the SITES table to make it better? At the moment it still looks like this:
                SITE_NUMBER       SPECIES_PRESENT
                1                Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis
                2                Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile
                3                Dichelachne_micrantha, Galaxias_pullus
                4                Spio_aequalis
                5                Pachycladon_exile, Geranium_retrorsum
                6                Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei
                7                Drosera_pygmaea
                The only thing I can think of is instead of having a single SPECIES_PRESENT column I could have series of columns called SPECIES1, SPECIES2...etc each listing a single species.

                The problem is that in the actual sites table some sites have a very large number of species so if I did it this way then then there would be a lot of columns.
                • 5. Re: Creating a scoring column based on threatened species lists
                  Frank Kulash
                  993451 wrote:
                  Thanks Frank, that's awesome!

                  Sorry about the incorrect INSERT statements I will be sure to test them in future.

                  Frank Kulash wrote:
                  Neither sites not species list has a good table design. Relational databases work best when each row represents one individual item. In species_list, each row represents 3 separate items. In sites, each row represents 1 or more separate items. You code will be more efficient, and much easier to write and to maintain, if you change these tables so that they have more rows, and each row represents only 1 individual.
                  I see. So it would be better to split the species_list table into three separate table i.e. nationally_critical_species, nationally_endangered_species, nationally_vulnerable_species:
                  Why not have one table, but a separate row for each species, with the level of endangerment (if any) being an attribiute of the species? The distinction between citically endagered species and vulnerable species is important, of course, but aren't the two kinds iof species fundamentally the same kind of entity? Don't they have the same kinds of attributes? Aren't they used in the same way?
                  Here's one possible table design:
                  CREATE TABLE  species
                  (   species_id          NUMBER (10)     PRIMARY KEY
                  ,   species_name     VARCHAR2 (50)     UNIQUE
                  ,   threat_level     VARCHAR2 (1)     CONSTRAINT  species_danger_chk
                                            CHECK  (threat_level IN ( 'C'
                                                                 , 'E'
                                                          , 'V'
                                                          )
                                                  )
                  );
                  
                  INSERT INTO species (species_id, species_name,          threat_level)
                               VALUES (1,          'Olearia pachyphylla', 'C');
                  INSERT INTO species (species_id, species_name,          threat_level)
                               VALUES (2,          'Oligosoma judgei',     'E');
                  INSERT INTO species (species_id, species_name,          threat_level)
                               VALUES (3,          'Aster alpinus',     NULL);
                  ...
                  Since species_name is unique, and (I imagine) it is unlikely to change, then there might be an argument for not having a species_id, and making species_name the primary key.
                  Here, threat_level s the primary key of the threat_status_score table.
                  ... However, I'm not sure how to change the table design of the SITES table to make it better? ...
                  The only thing I can think of is instead of having a single SPECIES_PRESENT column I could have series of columns called SPECIES1, SPECIES2...etc each listing a single species.

                  The problem is that in the actual sites table some sites have a very large number of species so if I did it this way then then there would be a lot of columns.
                  You're right; that would be a problem. The number of columns in a table is fixed, and while that number can be adjusted, it's not something that people do all the time. Tables can have as many rows as necessary, and adding or removing rows is easy. Put each sighting on a separate row, something like this:
                  CREATE TABLE species_present
                  (   site_number          NUMBER (10)
                  ,   species_id          NUMBER (10)
                  ,   CONSTRAINT species_present_pk PRIMARY KEY ( site_number
                                                              , species_id
                                                 )
                  );
                  
                  INSERT INTO species_present (site_number, species_id) VALUES (1, 1);
                  INSERT INTO species_present (site_number, species_id) VALUES (1, 4);
                  INSERT INTO species_present (site_number, species_id) VALUES (1, 9);
                  INSERT INTO species_present (site_number, species_id) VALUES (4, 9);
                  If you're using Oracle 11.2, then the query for total_scores might be:
                  SELECT    pr.site_number
                  ,       LISTAGG (s.species_name, ', ') WITHIN GROUP (ORDER BY s.species_name)
                                             AS species_present
                  ,       SUM (t.score)          AS total_score
                  FROM       species_present     sp
                  JOIN       species          s   ON   s.species_id    = sp.secies_id
                  JOIN       threat_status_scores     t   ON   t.threat_level  = s.threat_level
                  GROUP BY  pr.site_number
                  ORDER BY  pr.site_number
                  ;
                  Sorry, I'm not at an Oracle 11.2 database now, so I can't test this.

                  Edited by: Frank Kulash on Apr 2, 2013 6:16 AM
                  Added query
                  • 6. Re: Creating a scoring column based on threatened species lists
                    996454
                    If you're using Oracle 11.2, then the query for total_scores might be:
                    
                    SELECT    pr.site_number
                    ,       LISTAGG (s.species_name, ', ') WITHIN GROUP (ORDER BY s.species_name)
                                               AS species_present
                    ,       SUM (t.score)          AS total_score
                    FROM       species_present     sp
                    JOIN       species          s   ON   s.species_id    = sp.secies_id
                    JOIN       threat_status_scores     t   ON   t.threat_level  = s.threat_level
                    GROUP BY  pr.site_number
                    ORDER BY  pr.site_number
                    Thanks for this. This works great, all I needed to do was change a minor spelling error and not include the pr prefix at the start of site_number. What is the pr prefix? Sorry, I'm a beginner at sql.

                    Here is the new full DDL based on your post:
                    CREATE TABLE  species
                    (   species_id          NUMBER (10)     PRIMARY KEY
                    ,   species_name     VARCHAR2 (50)     UNIQUE
                    ,   threat_level     VARCHAR2 (1)     CONSTRAINT  species_danger_chk
                                              CHECK  (threat_level IN ( 'C'
                                                      , 'E'
                                                      , 'V'
                                                      )
                                                    )
                    );
                     
                    INSERT INTO species (species_id, species_name,          threat_level)
                                 VALUES (1,          'Olearia pachyphylla', 'C');
                    INSERT INTO species (species_id, species_name,          threat_level)
                                 VALUES (2,          'Oligosoma judgei',     'E');
                    INSERT INTO species (species_id, species_name,          threat_level)
                                 VALUES (3,          'Aster alpinus',     NULL);
                    
                    ---
                    
                    CREATE TABLE species_present
                    (   site_number          NUMBER (10)
                    ,   species_id          NUMBER (10)
                    ,   CONSTRAINT species_present_pk PRIMARY KEY ( site_number
                                                                , species_id
                                                   )
                    );
                     
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 1);
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 4);
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 9);
                    INSERT INTO species_present (site_number, species_id) VALUES (4, 9);
                    
                    ---
                    
                    CREATE TABLE threat_status_scores
                    (threat_level varchar2(50),
                    score number(4,2));
                    /
                    INSERT INTO threat_status_scores VALUES ('C', 1);
                    INSERT INTO threat_status_scores VALUES ('E', 0.5);
                    INSERT INTO threat_status_scores VALUES ('V', 0.25);
                    
                    ----------
                    
                    SELECT    site_number
                    ,       LISTAGG (s.species_name, ', ') WITHIN GROUP (ORDER BY s.species_name)
                                               AS species_present
                    ,       SUM (t.score)          AS total_score
                    FROM       species_present     sp
                    JOIN       species          s   ON   s.species_id    = sp.species_id
                    JOIN       threat_status_scores     t   ON   t.threat_level  = s.threat_level
                    GROUP BY  site_number
                    ORDER BY  site_number;
                    >
                    Why not have one table, but a separate row for each species, with the level of endangerment (if any) being an attribiute of the species? The distinction between >citically endagered species and vulnerable species is important, of course, but aren't the two kinds iof species fundamentally the same kind of entity? Don't they >have the same kinds of attributes? Aren't they used in the same way?
                    Here's one possible table design:
                    You're right. It is much better to have a single table for threatened species with three columns: species_id, species_name, and threat_level.
                    Since species_name is unique, and (I imagine) it is unlikely to change, then there might be an argument for not having a species_id, and making species_name the >primary key.
                    Species_names are unique, however believe it or not they do sometimes change - at least in New Zealand! The botanists and ecologists put out updated species lists every few years and occassionally they will change species names for various reasons.

                    You're right; that would be a problem. The number of columns in a table is fixed, and while that number can be adjusted, it's not something that people do all the >time. Tables can have as many rows as necessary, and adding or removing rows is easy. Put each sighting on a separate row, something like this:
                    CREATE TABLE species_present
                    (   site_number          NUMBER (10)
                    ,   species_id          NUMBER (10)
                    ,   CONSTRAINT species_present_pk PRIMARY KEY ( site_number
                                                                , species_id
                                                   )
                    );
                     
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 1);
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 4);
                    INSERT INTO species_present (site_number, species_id) VALUES (1, 9);
                    INSERT INTO species_present (site_number, species_id) VALUES (4, 9);
                    I think that this is a good idea. However, I'm not allowed to change sites table (I should have mentioned this earlier).
                    What happens is that contractors collect the threatened species information and they have already put the table in this format:
                    CREATE TABLE sites 
                    (site_number varchar2(3) not null,
                    species_present varchar2(255),
                    CONSTRAINT site_number_pk PRIMARY KEY (site_number))
                    / 
                    INSERT INTO sites VALUES (1, 'Olearia_pachyphylla, Oligosoma_whitakeri, Spio_aequalis');
                    INSERT INTO sites VALUES (2, 'Ophioglossum_petiolatum, Ourisia_modesta, Pachycladon_exile');
                    INSERT INTO sites VALUES (3, 'Dichelachne_micrantha, Galaxias_pullus');
                    INSERT INTO sites VALUES (4, 'Spio_aequalis');
                    INSERT INTO sites VALUES (5, 'Pachycladon_exile, Geranium_retrorsum');
                    INSERT INTO sites VALUES (6, 'Olearia_pachyphylla, Ourisia_modesta, Oligosoma_judgei');
                    INSERT INTO sites VALUES (7, 'Drosera_pygmaea');
                    • 7. Re: Creating a scoring column based on threatened species lists
                      Frank Kulash
                      Hi,
                      993451 wrote:
                      Thanks for this. This works great, all I needed to do was change a minor spelling error and not include the pr prefix at the start of site_number. What is the pr prefix? Sorry, I'm a beginner at sql.
                      My mistake; I was thinking of using the table alias pr for species_<b>PR</b>esent. That's the kind of mistake I'm likely to make when I can't test something.
                      CREATE TABLE threat_status_scores
                      (threat_level varchar2(50),
                      score number(4,2));
                      /
                      INSERT INTO threat_status_scores VALUES ('C', 1);
                      INSERT INTO threat_status_scores VALUES ('E', 0.5);
                      INSERT INTO threat_status_scores VALUES ('V', 0.25);
                      Keep the full description of the threat level. You might not need it for this query, but it will probably be useful in other queries. You can't expect all the users to be familiar with the short, cryptic keys (like 'C') that you store in the other tables. How about this?
                      CREATE TABLE threat_status
                      (   threat_level     VARCHAR2 (5)     PRIMARY KEY
                      ,   description          VARCHAR2 (50)     UNIQUE
                      ,   score           NUMBER (4,2)
                      );
                      
                      INSERT INTO threat_status (threat_level, description,    score)
                                         VALUES ('C', 'Nationally Critical',      1);
                      INSERT INTO threat_status (threat_level, description,    score)
                                              VALUES ('E', 'Nationally Endangered', 0.5);
                      INSERT INTO threat_status (threat_level, description,    score)
                                         VALUES ('V', 'Nationally Vulnerable', 0.25);
                      I like the name threat_status (or even threat) better than threat_status_<b>scores</b>. Score is only 1 attribute of the table.
                      ... Species_names are unique, however believe it or not they do sometimes change - at least in New Zealand! The botanists and ecologists put out updated species lists every few years and occassionally they will change species names for various reasons.
                      That's very interesting! I'm glad you posted it.
                      ... I think that this is a good idea. However, I'm not allowed to change sites table (I should have mentioned this earlier).
                      What happens is that contractors collect the threatened species information and they have already put the table in this format:
                      Okay, if that table can't change, then create a new table that has the same data, but in a normalized form. Better to do it once, and store the results in an easy-to-use table, than to do it in every query where you need the data normalized.