python - Pandas: Select based on number of items in a set -
I have such a dataframe (simplified):
year id value 2000 A 0 1 2001A2 2000A233 2000B3 4 2001B4000 2000C5 6001C770D D7 8 1990E81991E910 1993E10 11 1993E1112 1994 E12
I am only interested in those IDs that exist for 3 or more years. For the ID (set (DF ['id']) in the list, I can deftly test and test through each AD
if: lane (list (set (set ( DF [DF ['id'] == ID] ['year'])))> = 3: df2 = df2.append (df [df ['id'] == id]) year ID value 8 1990 e8 9 99 1E 910 1993E10 11 1993E11 12 1994E12
But it seems that there should be a simple way.
Use:
(df.groupby (['id']). Filter Mbda x: x ['year']. New York ()> Year ID value 8 1990 E8 9 1 9 E 9 10 1993 E10 11 1993 E 11 12 1994 E 12 [ 5 rows x 3 column]
Comments
Post a Comment