I’m working on creating various 0/1 product flags category variables based on multiple substrings within a string field. The string values contained in the category_description field don’t stick to a consistent classification pattern. Notice for each flag variable I’m using two regexp_like statements. The two things I’m trying to accomplish are.
- For each category flag variable combine the two regexp_like statements into only one regexp_like statement
- I’m still trying to correct some of the issues I’m running to with the regex syntax.
- Swimsuit classifying categories which actually dress suits and not swimsuits or flagging jumpsuits as swimsuits
- A category is being flagged as swimwear when it represents just shorts. Example: Men > Clothing > Shorts & Swimwear > Shorts
- Situations where there are no spaces between substrings. Example: where the description Layette/Infant Unisex - Accessories – Sale is not being flagged as baby when it contains the word infant
Below is my current code, sample data, and my desired output. Any assistance will be greatly appreciated. Thanks!
Sample data:
CREATE TABLE category_flags
(
category_description VARCHAR(80)
);
INSERT INTO category_flags
(category_description)
VALUES
("Baby > Basics > Accessories > Hats,Scarves, & Gloves")
("Men > Shoes & Accessories > Hats")
("Men > Shoes & Accessories > Ties")
("Women > Clothing > Jackets")
("Women > Shoes & Accessories > Hats")
("Kids > Boys > Jackets")
("Tall/Men/Outerwear/Cloth/Sale")
("Men > Bottoms > Swim")
("Men > Bottoms > Swimwear")
("US/Big_Tall/Men/Chinos/Flat_Front")
("men>clothing>sweatshirts")
("men>clothing>swimwear")
("Children/Girls/Outfit")
("Men > Accessories > Fragrance")
("Black_Top/Men/Dress_Shirts/Slim_Fit/Sale")
("Men > Bottoms > Suits, Sport Coats & Trousers")
("Women > Tops > Sweaters")
("Kids > Accessories > Boy's Accessories > Bags")
("boys>accessories>socks")
("Blue_Look/Women/Pants/Sale")
("women > Clothing/Knits/> Jackets")
("Women > Clothing > Jackets")
("Women > Shoes & Accessories > Hats")
("Women > Clothing > Jackets")
("Dresswear/Women > Belts")
("clothing > men >shorts & swim trunks > swimtrunks")
("Men > Clothing > Shorts & Swimwear > Shorts")
("men>clothing>swimsuits")
("boys>accessories>socks")
("Layette/Infant Unisex - Accessories")
("Layette/Infant Unisex - Accessories - Sale")
("Basics > Baby Boy & Girl> Accessories > Hats,Scarves, & Gloves")
("Women/Clothing & Shoes/Swimwear")
("Evening/Men > Shoes & Accessories > Ties")
("Summer/Child/Jumpers")
("women/fall/clothing & accessories/jumpsuits")
("Women/Clothing & Swimwear/Skirts")
);
Current script:
select
category_description,
case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and not regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Mens_Apparel,
case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Mens_Accessories,
case when regexp_like (category_description, 'women','i')
and not regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Womens_Apprarel,
case when regexp_like (category_description, 'women','i')
and regexp_like(category_description, 'accessories|hats|bags|belts|shoes|socks|ties' ,'i') then 1 else 0 end as Womens_Accessories ,
case when regexp_like (category_description, 'boys?|girls?|junior|child(ren)?|kids?','i')
and not regexp_like (category_description,'infant|babys?|hidden|womens?|mens?|gift cards?|test','i') then 1 else 0 end as Kids_Products ,
case when regexp_like(category_description,'toddler|infant|babys?|','i')
and not regexp_like (category_description,'hidden|womens|mens|^gift cards','i') then 1 else 0 end as Baby_Products,
case when regexp_like(category_description, '(.[^o]|[^w]o)men|^men','i')
and regexp_like(category_description,'swim(wear)?|swim(suit)?','i')then 1 else 0 end as Mens_Swimwear ,
case when regexp_like(category_description,'womens','i')
and regexp_like(category_description,'(swim(wear|suits))','i') then 1 else 0 end as Womens_Swimwear
from category_flags
category_description | Mens_Apparel | Womens_Apprarel | Mens_Accessories | Womens_Accessories | Mens_Swim | Womens_Swim | Kids_Products | Baby_Products |
Baby > Basics > Accessories > Hats,Scarves, & Gloves | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
Men > Shoes & Accessories > Hats | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Men > Shoes & Accessories > Ties | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Women > Clothing > Jackets | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Women > Shoes & Accessories > Hats | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
Kids > Boys > Jackets | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Tall/Men/Outerwear/Cloth/Sale | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Men > Bottoms > Swim | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Men > Bottoms > Swimwear | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
US/Big_Tall/Men/Chinos/Flat_Front | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
men>clothing>sweatshirts | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
men>clothing>swimwear | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Children/Girls/Outfit | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
Men > Accessories > Fragrance | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
Black_Top/Men/Dress_Shirts/Slim_Fit/Sale | 1 | 0 | 0 | 0 | 0 |
|