pandas.Grouper#
- class pandas.Grouper(*args, **kwargs)[source]#
- A Grouper allows the user to specify a groupby instruction for an object. - This specification will select a column via the key parameter, or if the level and/or axis parameters are given, a level of the index of the target object. - If axis and/or level are passed as keywords to both Grouper and groupby, the values passed to Grouper take precedence. - Parameters:
- keystr, defaults to None
- Groupby key, which selects the grouping column of the target. 
- levelname/number, defaults to None
- The level for the target index. 
- freqstr / frequency object, defaults to None
- This will groupby the specified frequency if the target selection (via key or level) is a datetime-like object. For full specification of available frequencies, please see here. 
- axisstr, int, defaults to 0
- Number/name of the axis. 
- sortbool, default to False
- Whether to sort the resulting labels. 
- closed{‘left’ or ‘right’}
- Closed end of interval. Only when freq parameter is passed. 
- label{‘left’ or ‘right’}
- Interval boundary to use for labeling. Only when freq parameter is passed. 
- convention{‘start’, ‘end’, ‘e’, ‘s’}
- If grouper is PeriodIndex and freq parameter is passed. 
- originTimestamp or str, default ‘start_day’
- The timestamp on which to adjust the grouping. The timezone of origin must match the timezone of the index. If string, must be one of the following: - ‘epoch’: origin is 1970-01-01 
- ‘start’: origin is the first value of the timeseries 
- ‘start_day’: origin is the first day at midnight of the timeseries 
- ‘end’: origin is the last value of the timeseries 
- ‘end_day’: origin is the ceiling midnight of the last day 
 - New in version 1.3.0. 
- offsetTimedelta or str, default is None
- An offset timedelta added to the origin. 
- dropnabool, default True
- If True, and if group keys contain NA values, NA values together with row/column will be dropped. If False, NA values will also be treated as the key in groups. - New in version 1.2.0. 
 
- Returns:
- Grouper or pandas.api.typing.TimeGrouper
- A TimeGrouper is returned if - freqis not- None. Otherwise, a Grouper is returned.
 
 - Examples - df.groupby(pd.Grouper(key="Animal"))is equivalent to- df.groupby('Animal')- >>> df = pd.DataFrame( ... { ... "Animal": ["Falcon", "Parrot", "Falcon", "Falcon", "Parrot"], ... "Speed": [100, 5, 200, 300, 15], ... } ... ) >>> df Animal Speed 0 Falcon 100 1 Parrot 5 2 Falcon 200 3 Falcon 300 4 Parrot 15 >>> df.groupby(pd.Grouper(key="Animal")).mean() Speed Animal Falcon 200.0 Parrot 10.0 - Specify a resample operation on the column ‘Publish date’ - >>> df = pd.DataFrame( ... { ... "Publish date": [ ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-02"), ... pd.Timestamp("2000-01-09"), ... pd.Timestamp("2000-01-16") ... ], ... "ID": [0, 1, 2, 3], ... "Price": [10, 20, 30, 40] ... } ... ) >>> df Publish date ID Price 0 2000-01-02 0 10 1 2000-01-02 1 20 2 2000-01-09 2 30 3 2000-01-16 3 40 >>> df.groupby(pd.Grouper(key="Publish date", freq="1W")).mean() ID Price Publish date 2000-01-02 0.5 15.0 2000-01-09 2.0 30.0 2000-01-16 3.0 40.0 - If you want to adjust the start of the bins based on a fixed timestamp: - >>> start, end = '2000-10-01 23:30:00', '2000-10-02 00:30:00' >>> rng = pd.date_range(start, end, freq='7min') >>> ts = pd.Series(np.arange(len(rng)) * 3, index=rng) >>> ts 2000-10-01 23:30:00 0 2000-10-01 23:37:00 3 2000-10-01 23:44:00 6 2000-10-01 23:51:00 9 2000-10-01 23:58:00 12 2000-10-02 00:05:00 15 2000-10-02 00:12:00 18 2000-10-02 00:19:00 21 2000-10-02 00:26:00 24 Freq: 7min, dtype: int64 - >>> ts.groupby(pd.Grouper(freq='17min')).sum() 2000-10-01 23:14:00 0 2000-10-01 23:31:00 9 2000-10-01 23:48:00 21 2000-10-02 00:05:00 54 2000-10-02 00:22:00 24 Freq: 17min, dtype: int64 - >>> ts.groupby(pd.Grouper(freq='17min', origin='epoch')).sum() 2000-10-01 23:18:00 0 2000-10-01 23:35:00 18 2000-10-01 23:52:00 27 2000-10-02 00:09:00 39 2000-10-02 00:26:00 24 Freq: 17min, dtype: int64 - >>> ts.groupby(pd.Grouper(freq='17W', origin='2000-01-01')).sum() 2000-01-02 0 2000-04-30 0 2000-08-27 0 2000-12-24 108 Freq: 17W-SUN, dtype: int64 - If you want to adjust the start of the bins with an offset Timedelta, the two following lines are equivalent: - >>> ts.groupby(pd.Grouper(freq='17min', origin='start')).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64 - >>> ts.groupby(pd.Grouper(freq='17min', offset='23h30min')).sum() 2000-10-01 23:30:00 9 2000-10-01 23:47:00 21 2000-10-02 00:04:00 54 2000-10-02 00:21:00 24 Freq: 17min, dtype: int64 - To replace the use of the deprecated base argument, you can now use offset, in this example it is equivalent to have base=2: - >>> ts.groupby(pd.Grouper(freq='17min', offset='2min')).sum() 2000-10-01 23:16:00 0 2000-10-01 23:33:00 9 2000-10-01 23:50:00 36 2000-10-02 00:07:00 39 2000-10-02 00:24:00 24 Freq: 17min, dtype: int64