7 Replies Latest reply: Oct 18, 2012 9:44 AM by Sherry Lamonica-Oracle RSS

    Creating a subset in R

    968597
      I am attempting to create a subset in R. I have a list of hospitals, and which state they are in. I want my subset to only include the states which contain 20 or more hospitals. My data is called outcome, and currently the formula I am using looks like this:
      outcome2<-subset(outcome, State>=20)
      When I then type: > table(outcome2$State), my table in outcome2 is the same as outcome, with none of the data being removed.

      I believe I have to sum up the number of hospitals in each state before I try to create a subset because it is looking at each row in the data and reading AK,AL,.... but I don't know how to do this.

      Anyone able to help?
        • 1. Re: Creating a subset in R
          Sherry Lamonica-Oracle
          Is your State variable numeric? The logical comparison (>=) will only work for numeric variables. Here's a toy example to demonstrate:

          # Create the outcome data frame
          outcome <- data.frame(State = c(10, 20, 30), StateName = c("AL", "AK", "AR"))
          outcome
          State StateName
          1 10 AL
          2 20 AK
          3 30 AR

          # Check the class of each variable. State is numeric and StateName is a categorial variable (factor).
          sapply(outcome, data.class)
          State StateName
          "numeric" "factor"
          outcome2 <- subset(outcome, State >=20)
          outcome2
          State StateName
          2 20 AK
          3 30 AR

          Can you show us the first few rows of your data by typing at the R console:
          head(outcome)
          and the class of each variable:
          sapply(outcome, data.class)
          This will help us narrow down the problem. Thanks!

          Sherry
          • 2. Re: Creating a subset in R
            968597
            Hi Sherry,

            Thanks for your reply, although I may not have been completely clear on what I am trying to do, so I can try to explain again.
            The data source that I have is on the number of hospitals in the USA. One of my columns is the State names. From typing:
            table(outcome$State)
            I can see how many hospitals in each state there are. My data shows AK = 17, AL=98, AR=77 ... WY=29.
            I need to remove the states from the data that have under 20 hospitals.

            If I assign a number to each of these states 1:50, it will remove the first 20 states from the data, rather than remove the states with under 20 hospitals if I continue to use the formula:
            outcome2<-subset(outcome, State>=20)

            Cheers
            Sean
            • 3. Re: Creating a subset in R
              Sherry Lamonica-Oracle
              Thanks for clarifying, Sean. Please paste the output after running:

              head(outcome)

              and I will reply with the required syntax.

              Best Regards,

              Sherry
              • 4. Re: Creating a subset in R
                968597
                There's a lot of columns in the data. Column 7 has the state name. Here is the output for head(outcome):

                Provider.Number Hospital.Name Address.1
                1 010001 SOUTHEAST ALABAMA MEDICAL CENTER 1108 ROSS CLARK CIRCLE
                2 010005 MARSHALL MEDICAL CENTER SOUTH 2505 U S HIGHWAY 431 NORTH
                3 010006 ELIZA COFFEE MEMORIAL HOSPITAL 205 MARENGO STREET
                4 010007 MIZELL MEMORIAL HOSPITAL 702 N MAIN ST
                5 010008 CRENSHAW COMMUNITY HOSPITAL 101 HOSPITAL CIRCLE
                6 010010 MARSHALL MEDICAL CENTER NORTH 8000 ALABAMA HIGHWAY 69
                Address.2 Address.3 City State ZIP.Code County.Name Phone.Number
                1 DOTHAN AL 36301 HOUSTON 3347938701
                2 BOAZ AL 35957 MARSHALL 2565938310
                3 FLORENCE AL 35631 LAUDERDALE 2567688400
                4 OPP AL 36467 COVINGTON 3344933541
                5 LUVERNE AL 36049 CRENSHAW 3343353374
                6 GUNTERSVILLE AL 35976 MARSHALL 2565718000
                Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1 14.3
                2 18.5
                3 18.1
                4 Not Available
                5 Not Available
                6 Not Available
                Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1 No Different than U.S. National Rate
                2 No Different than U.S. National Rate
                3 No Different than U.S. National Rate
                4 Number of Cases Too Small
                5 Number of Cases Too Small
                6 Number of Cases Too Small
                Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1 12.1
                2 14.7
                3 14.8
                4 Not Available
                5 Not Available
                6 Not Available
                Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1 17.0
                2 23.0
                3 21.8
                4 Not Available
                5 Not Available
                6 Not Available
                Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1 666
                2 44
                3 329
                4 14
                5 9
                6 22
                Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
                1
                2
                3
                4 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                5 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                6 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1 11.4
                2 15.2
                3 11.3
                4 13.6
                5 13.8
                6 12.5
                Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1 No Different than U.S. National Rate
                2 Worse than U.S. National Rate
                3 No Different than U.S. National Rate
                4 No Different than U.S. National Rate
                5 No Different than U.S. National Rate
                6 No Different than U.S. National Rate
                Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1 9.5
                2 12.2
                3 9.1
                4 10.0
                5 9.9
                6 9.9
                Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1 13.7
                2 18.8
                3 13.9
                4 18.2
                5 18.7
                6 15.6
                Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1 741
                2 234
                3 523
                4 113
                5 53
                6 163
                Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
                1
                2
                3
                4
                5
                6
                Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1 10.9
                2 13.9
                3 13.4
                4 14.9
                5 15.8
                6 8.7
                Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1 No Different than U.S. National Rate
                2 No Different than U.S. National Rate
                3 No Different than U.S. National Rate
                4 No Different than U.S. National Rate
                5 No Different than U.S. National Rate
                6 Better than U.S. National Rate
                Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1 8.6
                2 11.3
                3 11.2
                4 11.6
                5 11.4
                6 6.8
                Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1 13.7
                2 17.0
                3 15.8
                4 19.0
                5 21.5
                6 11.0
                Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1 371
                2 372
                3 836
                4 239
                5 61
                6 315
                Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
                1
                2
                3
                4
                5
                6
                Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1 19.0
                2 Not Available
                3 17.8
                4 Not Available
                5 Not Available
                6 Not Available
                Comparison.to.U.S..Rate...Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1 No Different than U.S. National Rate
                2 Number of Cases Too Small
                3 No Different than U.S. National Rate
                4 Number of Cases Too Small
                5 Number of Cases Too Small
                6 Number of Cases Too Small
                Lower.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1 16.6
                2 Not Available
                3 14.9
                4 Not Available
                5 Not Available
                6 Not Available
                Upper.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1 21.7
                2 Not Available
                3 21.5
                4 Not Available
                5 Not Available
                6 Not Available
                Number.of.Patients...Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1 728
                2 21
                3 342
                4 1
                5 4
                6 13
                Footnote...Hospital.30.Day.Readmission.Rates.from.Heart.Attack
                1
                2 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                3
                4 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                5 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                6 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
                Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1 23.7
                2 22.5
                3 19.8
                4 27.1
                5 24.7
                6 23.9
                Comparison.to.U.S..Rate...Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1 No Different than U.S. National Rate
                2 No Different than U.S. National Rate
                3 Better than U.S. National Rate
                4 No Different than U.S. National Rate
                5 No Different than U.S. National Rate
                6 No Different than U.S. National Rate
                Lower.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1 21.3
                2 19.2
                3 17.2
                4 22.4
                5 19.9
                6 20.1
                Upper.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1 26.5
                2 26.1
                3 22.9
                4 31.9
                5 30.2
                6 28.2
                Number.of.Patients...Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1 891
                2 264
                3 614
                4 135
                5 59
                6 173
                Footnote...Hospital.30.Day.Readmission.Rates.from.Heart.Failure
                1
                2
                3
                4
                5
                6
                Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1 17.1
                2 17.6
                3 16.9
                4 19.4
                5 18.0
                6 18.7
                Comparison.to.U.S..Rate...Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1 No Different than U.S. National Rate
                2 No Different than U.S. National Rate
                3 No Different than U.S. National Rate
                4 No Different than U.S. National Rate
                5 No Different than U.S. National Rate
                6 No Different than U.S. National Rate
                Lower.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1 14.4
                2 15.0
                3 14.7
                4 15.9
                5 14.0
                6 15.7
                Upper.Readmission.Estimate...Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1 20.4
                2 20.6
                3 19.5
                4 23.2
                5 22.8
                6 22.2
                Number.of.Patients...Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1 400
                2 374
                3 842
                4 254
                5 56
                6 326
                Footnote...Hospital.30.Day.Readmission.Rates.from.Pneumonia
                1
                2
                3
                4
                5
                6
                >

                Thanks
                Sean
                • 5. Re: Creating a subset in R
                  Sherry Lamonica-Oracle
                  Hi Sean,

                  Here's some sample code that subsets the data based on the factor level counts in the 'State' variable:

                  outcome <- data.frame(State = rep(c("AL", "AK", "AR"), c(30, 10, 5)), Var2 = rnorm(45))
                  head(outcome)
                  tab <- table(outcome$State)
                  tab
                  mynames <- names(tab)[tab > 20]
                  mynames
                  outcome.sub <- outcome[!(outcome$State %in% mynames),]
                  outcome.sub

                  You can modify this code for your data frame. I hope it helps.

                  Sherry
                  • 6. Re: Creating a subset in R
                    968597
                    Hi Sherry,

                    I understand how that formula would work, but I have all 50 states, not 3. There must be a way of doing this without having to manually type in the total number of hospitals in all of the 50 states before I exclude those with under 20 in. Otherwise, I would just manually write those that have above 20. Does this make sense?
                    • 7. Re: Creating a subset in R
                      Sherry Lamonica-Oracle
                      Sean,

                      If you are referring to this syntax in my example:

                      outcome <- data.frame(State = rep(c("AL", "AK", "AR"), c(30, 10, 5)), Var2 = rnorm(45))

                      This is just a toy data frame to simulate your data. You won't need to type each state abbreviation because your data frame, 'outcome', already exists in your global environment. Try this using your outcome data instead of my test data.

                      Hope it helps,

                      Sherry