This discussion is archived
7 Replies Latest reply: Oct 18, 2012 7:44 AM by Sherry LaMonica

# Creating a subset in R

Currently Being Moderated
I am attempting to create a subset in R. I have a list of hospitals, and which state they are in. I want my subset to only include the states which contain 20 or more hospitals. My data is called outcome, and currently the formula I am using looks like this:
outcome2<-subset(outcome, State>=20)
When I then type: > table(outcome2\$State), my table in outcome2 is the same as outcome, with none of the data being removed.

I believe I have to sum up the number of hospitals in each state before I try to create a subset because it is looking at each row in the data and reading AK,AL,.... but I don't know how to do this.

Anyone able to help?
• ###### 1. Re: Creating a subset in R
Currently Being Moderated
Is your State variable numeric? The logical comparison (>=) will only work for numeric variables. Here's a toy example to demonstrate:

# Create the outcome data frame
outcome <- data.frame(State = c(10, 20, 30), StateName = c("AL", "AK", "AR"))
outcome
State StateName
1 10 AL
2 20 AK
3 30 AR

# Check the class of each variable. State is numeric and StateName is a categorial variable (factor).
sapply(outcome, data.class)
State StateName
"numeric" "factor"
outcome2 <- subset(outcome, State >=20)
outcome2
State StateName
2 20 AK
3 30 AR

Can you show us the first few rows of your data by typing at the R console:
and the class of each variable:
sapply(outcome, data.class)
This will help us narrow down the problem. Thanks!

Sherry
• ###### 2. Re: Creating a subset in R
Currently Being Moderated
Hi Sherry,

Thanks for your reply, although I may not have been completely clear on what I am trying to do, so I can try to explain again.
The data source that I have is on the number of hospitals in the USA. One of my columns is the State names. From typing:
table(outcome\$State)
I can see how many hospitals in each state there are. My data shows AK = 17, AL=98, AR=77 ... WY=29.
I need to remove the states from the data that have under 20 hospitals.

If I assign a number to each of these states 1:50, it will remove the first 20 states from the data, rather than remove the states with under 20 hospitals if I continue to use the formula:
outcome2<-subset(outcome, State>=20)

Cheers
Sean
• ###### 3. Re: Creating a subset in R
Currently Being Moderated
Thanks for clarifying, Sean. Please paste the output after running:

and I will reply with the required syntax.

Best Regards,

Sherry
• ###### 4. Re: Creating a subset in R
Currently Being Moderated
There's a lot of columns in the data. Column 7 has the state name. Here is the output for head(outcome):

1 010001 SOUTHEAST ALABAMA MEDICAL CENTER 1108 ROSS CLARK CIRCLE
2 010005 MARSHALL MEDICAL CENTER SOUTH 2505 U S HIGHWAY 431 NORTH
3 010006 ELIZA COFFEE MEMORIAL HOSPITAL 205 MARENGO STREET
4 010007 MIZELL MEMORIAL HOSPITAL 702 N MAIN ST
5 010008 CRENSHAW COMMUNITY HOSPITAL 101 HOSPITAL CIRCLE
6 010010 MARSHALL MEDICAL CENTER NORTH 8000 ALABAMA HIGHWAY 69
1 DOTHAN AL 36301 HOUSTON 3347938701
2 BOAZ AL 35957 MARSHALL 2565938310
3 FLORENCE AL 35631 LAUDERDALE 2567688400
4 OPP AL 36467 COVINGTON 3344933541
5 LUVERNE AL 36049 CRENSHAW 3343353374
6 GUNTERSVILLE AL 35976 MARSHALL 2565718000
Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1 14.3
2 18.5
3 18.1
4 Not Available
5 Not Available
6 Not Available
Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1 No Different than U.S. National Rate
2 No Different than U.S. National Rate
3 No Different than U.S. National Rate
4 Number of Cases Too Small
5 Number of Cases Too Small
6 Number of Cases Too Small
Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1 12.1
2 14.7
3 14.8
4 Not Available
5 Not Available
6 Not Available
Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1 17.0
2 23.0
3 21.8
4 Not Available
5 Not Available
6 Not Available
Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1 666
2 44
3 329
4 14
5 9
6 22
Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Attack
1
2
3
4 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
5 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
6 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1 11.4
2 15.2
3 11.3
4 13.6
5 13.8
6 12.5
Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1 No Different than U.S. National Rate
2 Worse than U.S. National Rate
3 No Different than U.S. National Rate
4 No Different than U.S. National Rate
5 No Different than U.S. National Rate
6 No Different than U.S. National Rate
Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1 9.5
2 12.2
3 9.1
4 10.0
5 9.9
6 9.9
Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1 13.7
2 18.8
3 13.9
4 18.2
5 18.7
6 15.6
Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1 741
2 234
3 523
4 113
5 53
6 163
Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Heart.Failure
1
2
3
4
5
6
Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1 10.9
2 13.9
3 13.4
4 14.9
5 15.8
6 8.7
Comparison.to.U.S..Rate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1 No Different than U.S. National Rate
2 No Different than U.S. National Rate
3 No Different than U.S. National Rate
4 No Different than U.S. National Rate
5 No Different than U.S. National Rate
6 Better than U.S. National Rate
Lower.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1 8.6
2 11.3
3 11.2
4 11.6
5 11.4
6 6.8
Upper.Mortality.Estimate...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1 13.7
2 17.0
3 15.8
4 19.0
5 21.5
6 11.0
Number.of.Patients...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1 371
2 372
3 836
4 239
5 61
6 315
Footnote...Hospital.30.Day.Death..Mortality..Rates.from.Pneumonia
1
2
3
4
5
6
1 19.0
2 Not Available
3 17.8
4 Not Available
5 Not Available
6 Not Available
1 No Different than U.S. National Rate
2 Number of Cases Too Small
3 No Different than U.S. National Rate
4 Number of Cases Too Small
5 Number of Cases Too Small
6 Number of Cases Too Small
1 16.6
2 Not Available
3 14.9
4 Not Available
5 Not Available
6 Not Available
1 21.7
2 Not Available
3 21.5
4 Not Available
5 Not Available
6 Not Available
1 728
2 21
3 342
4 1
5 4
6 13
1
2 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
3
4 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
5 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
6 number of cases is too small (fewer than 25) to reliably tell how well the hospital is performing
1 23.7
2 22.5
3 19.8
4 27.1
5 24.7
6 23.9
1 No Different than U.S. National Rate
2 No Different than U.S. National Rate
3 Better than U.S. National Rate
4 No Different than U.S. National Rate
5 No Different than U.S. National Rate
6 No Different than U.S. National Rate
1 21.3
2 19.2
3 17.2
4 22.4
5 19.9
6 20.1
1 26.5
2 26.1
3 22.9
4 31.9
5 30.2
6 28.2
1 891
2 264
3 614
4 135
5 59
6 173
1
2
3
4
5
6
1 17.1
2 17.6
3 16.9
4 19.4
5 18.0
6 18.7
1 No Different than U.S. National Rate
2 No Different than U.S. National Rate
3 No Different than U.S. National Rate
4 No Different than U.S. National Rate
5 No Different than U.S. National Rate
6 No Different than U.S. National Rate
1 14.4
2 15.0
3 14.7
4 15.9
5 14.0
6 15.7
1 20.4
2 20.6
3 19.5
4 23.2
5 22.8
6 22.2
1 400
2 374
3 842
4 254
5 56
6 326
1
2
3
4
5
6
>

Thanks
Sean
• ###### 5. Re: Creating a subset in R
Currently Being Moderated
Hi Sean,

Here's some sample code that subsets the data based on the factor level counts in the 'State' variable:

outcome <- data.frame(State = rep(c("AL", "AK", "AR"), c(30, 10, 5)), Var2 = rnorm(45))
tab <- table(outcome\$State)
tab
mynames <- names(tab)[tab > 20]
mynames
outcome.sub <- outcome[!(outcome\$State %in% mynames),]
outcome.sub

You can modify this code for your data frame. I hope it helps.

Sherry
• ###### 6. Re: Creating a subset in R
Currently Being Moderated
Hi Sherry,

I understand how that formula would work, but I have all 50 states, not 3. There must be a way of doing this without having to manually type in the total number of hospitals in all of the 50 states before I exclude those with under 20 in. Otherwise, I would just manually write those that have above 20. Does this make sense?
• ###### 7. Re: Creating a subset in R
Currently Being Moderated
Sean,

If you are referring to this syntax in my example:

outcome <- data.frame(State = rep(c("AL", "AK", "AR"), c(30, 10, 5)), Var2 = rnorm(45))

This is just a toy data frame to simulate your data. You won't need to type each state abbreviation because your data frame, 'outcome', already exists in your global environment. Try this using your outcome data instead of my test data.

Hope it helps,

Sherry

#### Legend

• Correct Answers - 10 points