What’s new in 2.0.0 (April 3, 2023)#
These are the changes in pandas 2.0.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Installing optional dependencies with pip extras#
When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras.
pip install "pandas[performance, aws]>=2.0.0"
The available extras, found in the installation guide, are
[all, performance, computation, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
sql-other, html, xml, plot, output_formatting, clipboard, compression, test] (GH 39164).
Index can now hold numpy numeric dtypes#
It is now possible to use any numpy numeric dtype in a Index (GH 42717).
Previously it was only possible to use int64, uint64 & float64 dtypes:
In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Int64Index([1, 2, 3], dtype="int64")
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: UInt64Index([1, 2, 3], dtype="uint64")
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Float64Index([1.0, 2.0, 3.0], dtype="float64")
Int64Index, UInt64Index & Float64Index were deprecated in pandas
version 1.4 and have now been removed. Instead Index should be used directly, and
can it now take all numpy numeric dtypes, i.e.
int8/ int16/int32/int64/uint8/uint16/uint32/uint64/float32/float64 dtypes:
In [1]: pd.Index([1, 2, 3], dtype=np.int8)
Out[1]: Index([1, 2, 3], dtype='int8')
In [2]: pd.Index([1, 2, 3], dtype=np.uint16)
Out[2]: Index([1, 2, 3], dtype='uint16')
In [3]: pd.Index([1, 2, 3], dtype=np.float32)
Out[3]: Index([1.0, 2.0, 3.0], dtype='float32')
The ability for Index to hold the numpy numeric dtypes has meant some changes in Pandas
functionality. In particular, operations that previously were forced to create 64-bit indexes,
can now create indexes with lower bit sizes, e.g. 32-bit indexes.
Below is a possibly non-exhaustive list of changes:
Instantiating using a numpy numeric array now follows the dtype of the numpy array. Previously, all indexes created from numpy numeric arrays were forced to 64-bit. Now, for example,
Index(np.array([1, 2, 3]))will beint32on 32-bit systems, where it previously would have beenint64even on 32-bit systems. InstantiatingIndexusing a list of numbers will still return 64bit dtypes, e.g.Index([1, 2, 3])will have aint64dtype, which is the same as previously.The various numeric datetime attributes of
DatetimeIndex(day,month,yearetc.) were previously in of dtypeint64, while they wereint32forarrays.DatetimeArray. They are nowint32onDatetimeIndexalso:In [4]: idx = pd.date_range(start='1/1/2018', periods=3, freq='ME') In [5]: idx.array.year Out[5]: array([2018, 2018, 2018], dtype=int32) In [6]: idx.year Out[6]: Index([2018, 2018, 2018], dtype='int32')
Level dtypes on Indexes from
Series.sparse.from_coo()are now of dtypeint32, the same as they are on therows/colson a scipy sparse matrix. Previously they were of dtypeint64.In [7]: from scipy import sparse In [8]: A = sparse.coo_matrix( ...: ([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])), shape=(3, 4) ...: ) ...: In [9]: ser = pd.Series.sparse.from_coo(A) In [10]: ser.index.dtypes Out[10]: level_0 int32 level_1 int32 dtype: object
Indexcannot be instantiated using a float16 dtype. Previously instantiating anIndexusing dtypefloat16resulted in aFloat64Indexwith afloat64dtype. It now raises aNotImplementedError:In [11]: pd.Index([1, 2, 3], dtype=np.float16) --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[11], line 1 ----> 1 pd.Index([1, 2, 3], dtype=np.float16) File ~/work/pandas/pandas/pandas/core/indexes/base.py:565, in Index.__new__(cls, data, dtype, copy, name, tupleize_cols) 561 arr = ensure_wrapped_if_datetimelike(arr) 563 klass = cls._dtype_to_subclass(arr.dtype) --> 565 arr = klass._ensure_array(arr, arr.dtype, copy=False) 566 return klass._simple_new(arr, name, refs=refs) File ~/work/pandas/pandas/pandas/core/indexes/base.py:578, in Index._ensure_array(cls, data, dtype, copy) 575 raise ValueError("Index data must be 1-dimensional") 576 elif dtype == np.float16: 577 # float16 not supported (no indexing engine) --> 578 raise NotImplementedError("float16 indexes are not supported") 580 if copy: 581 # asarray_tuplesafe does not always copy underlying data, 582 # so need to make sure that this happens 583 data = data.copy() NotImplementedError: float16 indexes are not supported
Argument dtype_backend, to return pyarrow-backed or numpy-backed nullable dtypes#
The following functions gained a new keyword dtype_backend (GH 36712)
When this option is set to "numpy_nullable" it will return a DataFrame that is
backed by nullable dtypes.
When this keyword is set to "pyarrow", then these functions will return pyarrow-backed nullable ArrowDtype DataFrames (GH 48957, GH 49997):
In [12]: import io
In [13]: data = io.StringIO("""a,b,c,d,e,f,g,h,i
....: 1,2.5,True,a,,,,,
....: 3,4.5,False,b,6,7.5,True,a,
....: """)
....:
In [14]: df = pd.read_csv(data, dtype_backend="pyarrow")
In [15]: df.dtypes
Out[15]:
a int64[pyarrow]
b double[pyarrow]
c bool[pyarrow]
d string[pyarrow]
e int64[pyarrow]
f double[pyarrow]
g bool[pyarrow]
h string[pyarrow]
i null[pyarrow]
dtype: object
In [16]: data.seek(0)
Out[16]: 0
In [17]: df_pyarrow = pd.read_csv(data, dtype_backend="pyarrow", engine="pyarrow")
In [18]: df_pyarrow.dtypes
Out[18]:
a int64[pyarrow]
b double[pyarrow]
c bool[pyarrow]
d string[pyarrow]
e int64[pyarrow]
f double[pyarrow]
g bool[pyarrow]
h string[pyarrow]
i null[pyarrow]
dtype: object
Copy-on-Write improvements#
A new lazy copy mechanism that defers the copy until the object in question is modified was added to the methods listed in Copy-on-Write optimizations. These methods return views when Copy-on-Write is enabled, which provides a significant performance improvement compared to the regular execution (GH 49473).
Accessing a single column of a DataFrame as a Series (e.g.
df["col"]) now always returns a new object every time it is constructed when Copy-on-Write is enabled (not returning multiple times an identical, cached Series object). This ensures that those Series objects correctly follow the Copy-on-Write rules (GH 49450)The
Seriesconstructor will now create a lazy copy (deferring the copy until a modification to the data happens) when constructing a Series from an existing Series with the default ofcopy=False(GH 50471)The
DataFrameconstructor will now create a lazy copy (deferring the copy until a modification to the data happens) when constructing from an existingDataFramewith the default ofcopy=False(GH 51239)The
DataFrameconstructor, when constructing a DataFrame from a dictionary of Series objects and specifyingcopy=False, will now use a lazy copy of those Series objects for the columns of the DataFrame (GH 50777)The
DataFrameconstructor, when constructing a DataFrame from aSeriesorIndexand specifyingcopy=False, will now respect Copy-on-Write.The
DataFrameandSeriesconstructors, when constructing from a NumPy array, will now copy the array by default to avoid mutating theDataFrame/Serieswhen mutating the array. Specifycopy=Falseto get the old behavior. When settingcopy=Falsepandas does not guarantee correct Copy-on-Write behavior when the NumPy array is modified after creation of theDataFrame/Series.The
DataFrame.from_records()will now respect Copy-on-Write when called with aDataFrame.Trying to set values using chained assignment (for example,
df["a"][1:3] = 0) will now always raise a warning when Copy-on-Write is enabled. In this mode, chained assignment can never work because we are always setting into a temporary object that is the result of an indexing operation (getitem), which under Copy-on-Write always behaves as a copy. Thus, assigning through a chain can never update the original Series or DataFrame. Therefore, an informative warning is raised to the user to avoid silently doing nothing (GH 49467)DataFrame.replace()will now respect the Copy-on-Write mechanism wheninplace=True.DataFrame.transpose()will now respect the Copy-on-Write mechanism.Arithmetic operations that can be inplace, e.g.
ser *= 2will now respect the Copy-on-Write mechanism.DataFrame.__getitem__()will now respect the Copy-on-Write mechanism when theDataFramehasMultiIndexcolumns.Series.__getitem__()will now respect the Copy-on-Write mechanism when theSerieshas aMultiIndex.
Series.view()will now respect the Copy-on-Write mechanism.
Copy-on-Write can be enabled through one of
pd.set_option("mode.copy_on_write", True)
pd.options.mode.copy_on_write = True
Alternatively, copy on write can be enabled locally through:
with pd.option_context("mode.copy_on_write", True):
...
Other enhancements#
Added support for
straccessor methods when usingArrowDtypewith apyarrow.stringtype (GH 50325)Added support for
dtaccessor methods when usingArrowDtypewith apyarrow.timestamptype (GH 50954)read_sas()now supports usingencoding='infer'to correctly read and use the encoding specified by the sas file. (GH 48048)DataFrameGroupBy.quantile(),SeriesGroupBy.quantile()andDataFrameGroupBy.std()now preserve nullable dtypes instead of casting to numpy dtypes (GH 37493)DataFrameGroupBy.std(),SeriesGroupBy.std()now support datetime64, timedelta64, andDatetimeTZDtypedtypes (GH 48481)Series.add_suffix(),DataFrame.add_suffix(),Series.add_prefix()andDataFrame.add_prefix()support anaxisargument. Ifaxisis set, the default behaviour of which axis to consider can be overwritten (GH 47819)testing.assert_frame_equal()now shows the first element where the DataFrames differ, analogously topytest’s output (GH 47910)Added
indexparameter toDataFrame.to_dict()(GH 46398)Added support for extension array dtypes in
merge()(GH 44240)Added metadata propagation for binary operators on
DataFrame(GH 28283)Added
cumsum,cumprod,cumminandcummaxto theExtensionArrayinterface via_accumulate(GH 28385)CategoricalConversionWarning,InvalidComparison,InvalidVersion,LossySetitemError, andNoBufferPresentare now exposed inpandas.errors(GH 27656)Fix
testoptional_extra by adding missing test packagepytest-asyncio(GH 48361)DataFrame.astype()exception message thrown improved to include column name when type conversion is not possible. (GH 47571)date_range()now supports aunitkeyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index (GH 49106)timedelta_range()now supports aunitkeyword (“s”, “ms”, “us”, or “ns”) to specify the desired resolution of the output index (GH 49824)DataFrame.to_json()now supports amodekeyword with supported inputs ‘w’ and ‘a’. Defaulting to ‘w’, ‘a’ can be used when lines=True and orient=’records’ to append record oriented json lines to an existing json file. (GH 35849)Added
nameparameter toIntervalIndex.from_breaks(),IntervalIndex.from_arrays()andIntervalIndex.from_tuples()(GH 48911)Improve exception message when using
testing.assert_frame_equal()on aDataFrameto include the column that is compared (GH 50323)Improved error message for
merge_asof()when join-columns were duplicated (GH 50102)Added support for extension array dtypes to
get_dummies()(GH 32430)Added
Index.infer_objects()analogous toSeries.infer_objects()(GH 50034)Added
copyparameter toSeries.infer_objects()andDataFrame.infer_objects(), passingFalsewill avoid making copies for series or columns that are already non-object or where no better dtype can be inferred (GH 50096)DataFrame.plot.hist()now recognizesxlabelandylabelarguments (GH 49793)Series.drop_duplicates()has gainedignore_indexkeyword to reset index (GH 48304)Series.dropna()andDataFrame.dropna()has gainedignore_indexkeyword to reset index (GH 31725)Improved error message in
to_datetime()for non-ISO8601 formats, informing users about the position of the first error (GH 50361)Improved error message when trying to align
DataFrameobjects (for example, inDataFrame.compare()) to clarify that “identically labelled” refers to both index and columns (GH 50083)Added support for
Index.min()andIndex.max()for pyarrow string dtypes (GH 51397)Added
DatetimeIndex.as_unit()andTimedeltaIndex.as_unit()to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns” (GH 50616)Added
Series.dt.unit()andSeries.dt.as_unit()to convert to different resolutions; supported resolutions are “s”, “ms”, “us”, and “ns” (GH 51223)Added new argument
dtypetoread_sql()to be consistent withread_sql_query()(GH 50797)read_csv(),read_table(),read_fwf()andread_excel()now acceptdate_format(GH 50601)to_datetime()now accepts"ISO8601"as an argument toformat, which will match any ISO8601 string (but possibly not identically-formatted) (GH 50411)to_datetime()now accepts"mixed"as an argument toformat, which will infer the format for each element individually (GH 50972)Added new argument
enginetoread_json()to support parsing JSON with pyarrow by specifyingengine="pyarrow"(GH 48893)Added support for SQLAlchemy 2.0 (GH 40686)
Added support for
decimalparameter whenengine="pyarrow"inread_csv()(GH 51302)Indexset operationsIndex.union(),Index.intersection(),Index.difference(), andIndex.symmetric_difference()now supportsort=True, which will always return a sorted result, unlike the defaultsort=Nonewhich does not sort in some cases (GH 25151)Added new escape mode “latex-math” to avoid escaping “$” in formatter (GH 50040)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
DataFrameGroupBy.cumsum() and DataFrameGroupBy.cumprod() overflow instead of lossy casting to float#
In previous versions we cast to float when applying cumsum and cumprod which
lead to incorrect results even if the result could be hold by int64 dtype.
Additionally, the aggregation overflows consistent with numpy and the regular
DataFrame.cumprod() and DataFrame.cumsum() methods when the limit of
int64 is reached (GH 37493).
Old Behavior
In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16
We return incorrect results with the 6th value.
New Behavior
In [19]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [20]: df.groupby("key")["value"].cumprod()
Out[20]:
0 625
1 390625
2 244140625
3 152587890625
4 95367431640625
5 59604644775390625
6 359414837200037393
Name: value, dtype: int64
We overflow with the 7th value, but the 6th value is still correct.
DataFrameGroupBy.nth() and SeriesGroupBy.nth() now behave as filtrations#
In previous versions of pandas, DataFrameGroupBy.nth() and
SeriesGroupBy.nth() acted as if they were aggregations. However, for most
inputs n, they may return either zero or multiple rows per group. This means
that they are filtrations, similar to e.g. DataFrameGroupBy.head(). pandas
now treats them as filtrations (GH 13666).
In [21]: df = pd.DataFrame({"a": [1, 1, 2, 1, 2], "b": [np.nan, 2.0, 3.0, 4.0, 5.0]})
In [22]: gb = df.groupby("a")
Old Behavior
In [5]: gb.nth(n=1)
Out[5]:
A B
1 1 2.0
4 2 5.0
New Behavior
In [23]: gb.nth(n=1)
Out[23]:
a b
1 1 2.0
4 2 5.0
In particular, the index of the result is derived from the input by selecting
the appropriate rows. Also, when n is larger than the group, no rows instead of
NaN is returned.
Old Behavior
In [5]: gb.nth(n=3, dropna="any")
Out[5]:
B
A
1 NaN
2 NaN
New Behavior
In [24]: gb.nth(n=3, dropna="any")
Out[24]:
Empty DataFrame
Columns: [a, b]
Index: []
Backwards incompatible API changes#
Construction with datetime64 or timedelta64 dtype with unsupported resolution#
In past versions, when constructing a Series or DataFrame and
passing a “datetime64” or “timedelta64” dtype with unsupported resolution
(i.e. anything other than “ns”), pandas would silently replace the given dtype
with its nanosecond analogue:
Previous behavior:
In [5]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[5]:
0 2016-01-01
dtype: datetime64[ns]
In [6] pd.Series(["2016-01-01"], dtype="datetime64[D]")
Out[6]:
0 2016-01-01
dtype: datetime64[ns]
In pandas 2.0 we support resolutions “s”, “ms”, “us”, and “ns”. When passing a supported dtype (e.g. “datetime64[s]”), the result now has exactly the requested dtype:
New behavior:
In [25]: pd.Series(["2016-01-01"], dtype="datetime64[s]")
Out[25]:
0 2016-01-01
dtype: datetime64[s]
With an un-supported dtype, pandas now raises instead of silently swapping in a supported dtype:
New behavior:
In [26]: pd.Series(["2016-01-01"], dtype="datetime64[D]")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[26], line 1
----> 1 pd.Series(["2016-01-01"], dtype="datetime64[D]")
File ~/work/pandas/pandas/pandas/core/series.py:511, in Series.__init__(self, data, index, dtype, name, copy, fastpath)
509 data = data.copy()
510 else:
--> 511 data = sanitize_array(data, index, dtype, copy)
513 manager = _get_option("mode.data_manager", silent=True)
514 if manager == "block":
File ~/work/pandas/pandas/pandas/core/construction.py:633, in sanitize_array(data, index, dtype, copy, allow_2d)
630 subarr = np.array([], dtype=np.float64)
632 elif dtype is not None:
--> 633 subarr = _try_cast(data, dtype, copy)
635 else:
636 subarr = maybe_convert_platform(data)
File ~/work/pandas/pandas/pandas/core/construction.py:790, in _try_cast(arr, dtype, copy)
785 return lib.ensure_string_array(arr, convert_na_value=False, copy=copy).reshape(
786 shape
787 )
789 elif dtype.kind in "mM":
--> 790 return maybe_cast_to_datetime(arr, dtype)
792 # GH#15832: Check if we are requesting a numeric dtype and
793 # that we can convert the data to the requested dtype.
794 elif dtype.kind in "iu":
795 # this will raise if we have e.g. floats
File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1219, in maybe_cast_to_datetime(value, dtype)
1215 raise TypeError("value must be listlike")
1217 # TODO: _from_sequence would raise ValueError in cases where
1218 # _ensure_nanosecond_dtype raises TypeError
-> 1219 _ensure_nanosecond_dtype(dtype)
1221 if lib.is_np_dtype(dtype, "m"):
1222 res = TimedeltaArray._from_sequence(value, dtype=dtype)
File ~/work/pandas/pandas/pandas/core/dtypes/cast.py:1277, in _ensure_nanosecond_dtype(dtype)
1274 raise ValueError(msg)
1275 # TODO: ValueError or TypeError? existing test
1276 # test_constructor_generic_timestamp_bad_frequency expects TypeError
-> 1277 raise TypeError(
1278 f"dtype={dtype} is not supported. Supported resolutions are 's', "
1279 "'ms', 'us', and 'ns'"
1280 )
TypeError: dtype=datetime64[D] is not supported. Supported resolutions are 's', 'ms', 'us', and 'ns'
Value counts sets the resulting name to count#
In past versions, when running Series.value_counts(), the result would inherit
the original object’s name, and the result index would be nameless. This would cause
confusion when resetting the index, and the column names would not correspond with the
column values.
Now, the result name will be 'count' (or 'proportion' if normalize=True was passed),
and the index will be named after the original object (GH 49497).
Previous behavior:
In [8]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[2]:
quetzal 2
elk 1
Name: animal, dtype: int64
New behavior:
In [27]: pd.Series(['quetzal', 'quetzal', 'elk'], name='animal').value_counts()
Out[27]:
animal
quetzal 2
elk 1
Name: count, dtype: int64
Likewise for other value_counts methods (for example, DataFrame.value_counts()).
Disallow astype conversion to non-supported datetime64/timedelta64 dtypes#
In previous versions, converting a Series or DataFrame
from datetime64[ns] to a different datetime64[X] dtype would return
with datetime64[ns] dtype instead of the requested dtype. In pandas 2.0,
support is added for “datetime64[s]”, “datetime64[ms]”, and “datetime64[us]” dtypes,
so converting to those dtypes gives exactly the requested dtype:
Previous behavior:
In [28]: idx = pd.date_range("2016-01-01", periods=3)
In [29]: ser = pd.Series(idx)
Previous behavior:
In [4]: ser.astype("datetime64[s]")
Out[4]:
0 2016-01-01
1 2016-01-02
2 2016-01-03
dtype: datetime64[ns]
With the new behavior, we get exactly the requested dtype:
New behavior:
In [30]: ser.astype("datetime64[s]")
Out[30]:
0 2016-01-01
1 2016-01-02
2 2016-01-03
dtype: datetime64[s]
For non-supported resolutions e.g. “datetime64[D]”, we raise instead of silently ignoring the requested dtype:
New behavior:
In [31]: ser.astype("datetime64[D]")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[31], line 1
----> 1 ser.astype("datetime64[D]")
File ~/work/pandas/pandas/pandas/core/generic.py:6576, in NDFrame.astype(self, dtype, copy, errors)
6572 results = [ser.astype(dtype, copy=copy) for _, ser in self.items()]
6574 else:
6575 # else, only a single dtype is given
-> 6576 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6577 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6578 return res.__finalize__(self, method="astype")
File ~/work/pandas/pandas/pandas/core/internals/managers.py:414, in BaseBlockManager.astype(self, dtype, copy, errors)
411 elif using_copy_on_write():
412 copy = False
--> 414 return self.apply(
415 "astype",
416 dtype=dtype,
417 copy=copy,
418 errors=errors,
419 using_cow=using_copy_on_write(),
420 )
File ~/work/pandas/pandas/pandas/core/internals/managers.py:354, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
352 applied = b.apply(f, **kwargs)
353 else:
--> 354 applied = getattr(b, f)(**kwargs)
355 result_blocks = extend_blocks(applied, result_blocks)
357 out = type(self).from_blocks(result_blocks, self.axes)
File ~/work/pandas/pandas/pandas/core/internals/blocks.py:677, in Block.astype(self, dtype, copy, errors, using_cow)
657 """
658 Coerce to the new dtype.
659
(...)
673 Block
674 """
675 values = self.values
--> 677 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
679 new_values = maybe_coerce_values(new_values)
681 refs = None
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:238, in astype_array_safe(values, dtype, copy, errors)
235 dtype = dtype.numpy_dtype
237 try:
--> 238 new_values = astype_array(values, dtype, copy=copy)
239 except (ValueError, TypeError):
240 # e.g. _astype_nansafe can fail on object-dtype of strings
241 # trying to convert to float
242 if errors == "ignore":
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:180, in astype_array(values, dtype, copy)
176 return values
178 if not isinstance(values, np.ndarray):
179 # i.e. ExtensionArray
--> 180 values = values.astype(dtype, copy=copy)
182 else:
183 values = _astype_nansafe(values, dtype, copy=copy)
File ~/work/pandas/pandas/pandas/core/arrays/datetimes.py:730, in DatetimeArray.astype(self, dtype, copy)
728 elif isinstance(dtype, PeriodDtype):
729 return self.to_period(freq=dtype.freq)
--> 730 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy)
File ~/work/pandas/pandas/pandas/core/arrays/datetimelike.py:490, in DatetimeLikeArrayMixin.astype(self, dtype, copy)
486 elif (dtype.kind in "mM" and self.dtype != dtype) or dtype.kind == "f":
487 # disallow conversion between datetime/timedelta,
488 # and conversions for any datetimelike to float
489 msg = f"Cannot cast {type(self).__name__} to dtype {dtype}"
--> 490 raise TypeError(msg)
491 else:
492 return np.asarray(self, dtype=dtype)
TypeError: Cannot cast DatetimeArray to dtype datetime64[D]
For conversion from timedelta64[ns] dtypes, the old behavior converted
to a floating point format.
Previous behavior:
In [32]: idx = pd.timedelta_range("1 Day", periods=3)
In [33]: ser = pd.Series(idx)
Previous behavior:
In [7]: ser.astype("timedelta64[s]")
Out[7]:
0 86400.0
1 172800.0
2 259200.0
dtype: float64
In [8]: ser.astype("timedelta64[D]")
Out[8]:
0 1.0
1 2.0
2 3.0
dtype: float64
The new behavior, as for datetime64, either gives exactly the requested dtype or raises:
New behavior:
In [34]: ser.astype("timedelta64[s]")
Out[34]:
0 1 days
1 2 days
2 3 days
dtype: timedelta64[s]
In [35]: ser.astype("timedelta64[D]")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[35], line 1
----> 1 ser.astype("timedelta64[D]")
File ~/work/pandas/pandas/pandas/core/generic.py:6576, in NDFrame.astype(self, dtype, copy, errors)
6572 results = [ser.astype(dtype, copy=copy) for _, ser in self.items()]
6574 else:
6575 # else, only a single dtype is given
-> 6576 new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
6577 res = self._constructor_from_mgr(new_data, axes=new_data.axes)
6578 return res.__finalize__(self, method="astype")
File ~/work/pandas/pandas/pandas/core/internals/managers.py:414, in BaseBlockManager.astype(self, dtype, copy, errors)
411 elif using_copy_on_write():
412 copy = False
--> 414 return self.apply(
415 "astype",
416 dtype=dtype,
417 copy=copy,
418 errors=errors,
419 using_cow=using_copy_on_write(),
420 )
File ~/work/pandas/pandas/pandas/core/internals/managers.py:354, in BaseBlockManager.apply(self, f, align_keys, **kwargs)
352 applied = b.apply(f, **kwargs)
353 else:
--> 354 applied = getattr(b, f)(**kwargs)
355 result_blocks = extend_blocks(applied, result_blocks)
357 out = type(self).from_blocks(result_blocks, self.axes)
File ~/work/pandas/pandas/pandas/core/internals/blocks.py:677, in Block.astype(self, dtype, copy, errors, using_cow)
657 """
658 Coerce to the new dtype.
659
(...)
673 Block
674 """
675 values = self.values
--> 677 new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
679 new_values = maybe_coerce_values(new_values)
681 refs = None
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:238, in astype_array_safe(values, dtype, copy, errors)
235 dtype = dtype.numpy_dtype
237 try:
--> 238 new_values = astype_array(values, dtype, copy=copy)
239 except (ValueError, TypeError):
240 # e.g. _astype_nansafe can fail on object-dtype of strings
241 # trying to convert to float
242 if errors == "ignore":
File ~/work/pandas/pandas/pandas/core/dtypes/astype.py:180, in astype_array(values, dtype, copy)
176 return values
178 if not isinstance(values, np.ndarray):
179 # i.e. ExtensionArray
--> 180 values = values.astype(dtype, copy=copy)
182 else:
183 values = _astype_nansafe(values, dtype, copy=copy)
File ~/work/pandas/pandas/pandas/core/arrays/timedeltas.py:380, in TimedeltaArray.astype(self, dtype, copy)
376 return type(self)._simple_new(
377 res_values, dtype=res_values.dtype, freq=self.freq
378 )
379 else:
--> 380 raise ValueError(
381 f"Cannot convert from {self.dtype} to {dtype}. "
382 "Supported resolutions are 's', 'ms', 'us', 'ns'"
383 )
385 return dtl.DatetimeLikeArrayMixin.astype(self, dtype, copy=copy)
ValueError: Cannot convert from timedelta64[ns] to timedelta64[D]. Supported resolutions are 's', 'ms', 'us', 'ns'
UTC and fixed-offset timezones default to standard-library tzinfo objects#
In previous versions, the default tzinfo object used to represent UTC
was pytz.UTC. In pandas 2.0, we default to datetime.timezone.utc instead.
Similarly, for timezones represent fixed UTC offsets, we use datetime.timezone
objects instead of pytz.FixedOffset objects. See (GH 34916)
Previous behavior:
In [2]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [3]: type(ts.tzinfo)
Out[3]: pytz.UTC
In [4]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [3]: type(ts2.tzinfo)
Out[5]: pytz._FixedOffset
New behavior:
In [36]: ts = pd.Timestamp("2016-01-01", tz="UTC")
In [37]: type(ts.tzinfo)
Out[37]: datetime.timezone
In [38]: ts2 = pd.Timestamp("2016-01-01 04:05:06-07:00")
In [39]: type(ts2.tzinfo)
Out[39]: datetime.timezone
For timezones that are neither UTC nor fixed offsets, e.g. “US/Pacific”, we
continue to default to pytz objects.
Empty DataFrames/Series will now default to have a RangeIndex#
Before, constructing an empty (where data is None or an empty list-like argument) Series or DataFrame without
specifying the axes (index=None, columns=None) would return the axes as empty Index with object dtype.
Now, the axes return an empty RangeIndex (GH 49572).
Previous behavior:
In [8]: pd.Series().index
Out[8]:
Index([], dtype='object')
In [9] pd.DataFrame().axes
Out[9]:
[Index([], dtype='object'), Index([], dtype='object')]
New behavior:
In [40]: pd.Series().index
Out[40]: RangeIndex(start=0, stop=0, step=1)
In [41]: pd.DataFrame().axes
Out[41]: [RangeIndex(start=0, stop=0, step=1), RangeIndex(start=0, stop=0, step=1)]
DataFrame to LaTeX has a new render engine#
The existing DataFrame.to_latex() has been restructured to utilise the
extended implementation previously available under Styler.to_latex().
The arguments signature is similar, albeit col_space has been removed since
it is ignored by LaTeX engines. This render engine also requires jinja2 as a
dependency which needs to be installed, since rendering is based upon jinja2 templates.
The pandas latex options below are no longer used and have been removed. The generic max rows and columns arguments remain but for this functionality should be replaced by the Styler equivalents. The alternative options giving similar functionality are indicated below:
display.latex.escape: replaced withstyler.format.escape,display.latex.longtable: replaced withstyler.latex.environment,display.latex.multicolumn,display.latex.multicolumn_formatanddisplay.latex.multirow: replaced withstyler.sparse.rows,styler.sparse.columns,styler.latex.multirow_alignandstyler.latex.multicol_align,display.latex.repr: replaced withstyler.render.repr,display.max_rowsanddisplay.max_columns: replace withstyler.render.max_rows,styler.render.max_columnsandstyler.render.max_elements.
Note that due to this change some defaults have also changed:
multirownow defaults to True.multirow_aligndefaults to “r” instead of “l”.multicol_aligndefaults to “r” instead of “l”.escapenow defaults to False.
Note that the behaviour of _repr_latex_ is also changed. Previously
setting display.latex.repr would generate LaTeX only when using nbconvert for a
JupyterNotebook, and not when the user is running the notebook. Now the
styler.render.repr option allows control of the specific output
within JupyterNotebooks for operations (not just on nbconvert). See GH 39911.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
mypy (dev) |
1.0 |
X |
|
pytest (dev) |
7.0.0 |
X |
|
pytest-xdist (dev) |
2.2.0 |
X |
|
hypothesis (dev) |
6.34.2 |
X |
|
python-dateutil |
2.8.2 |
X |
X |
tzdata |
2022.1 |
X |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
pyarrow |
7.0.0 |
X |
matplotlib |
3.6.1 |
X |
fastparquet |
0.6.3 |
X |
xarray |
0.21.0 |
X |
See Dependencies and Optional dependencies for more.
Datetimes are now parsed with a consistent format#
In the past, to_datetime() guessed the format for each element independently. This was appropriate for some cases where elements had mixed date formats - however, it would regularly cause problems when users expected a consistent format but the function would switch formats between elements. As of version 2.0.0, parsing will use a consistent format, determined by the first non-NA value (unless the user specifies a format, in which case that is used).
Old behavior:
In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0 2000-01-13
1 2000-12-01
dtype: datetime64[ns]
New behavior:
In [42]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [43]: pd.to_datetime(ser)
Out[43]:
0 2000-01-13
1 2000-01-12
dtype: datetime64[ns]
Note that this affects read_csv() as well.
If you still need to parse dates with inconsistent formats, you can use
format='mixed' (possibly alongside dayfirst)
ser = pd.Series(['13-01-2000', '12 January 2000'])
pd.to_datetime(ser, format='mixed', dayfirst=True)
or, if your formats are all ISO8601 (but possibly not identically-formatted)
ser = pd.Series(['2020-01-01', '2020-01-01 03:00'])
pd.to_datetime(ser, format='ISO8601')
Other API changes#
The
freq,tz,nanosecond, andunitkeywords in theTimestampconstructor are now keyword-only (GH 45307, GH 32526)Passing
nanosecondsgreater than 999 or less than 0 inTimestampnow raises aValueError(GH 48538, GH 48255)read_csv(): specifying an incorrect number of columns withindex_colof now raisesParserErrorinstead ofIndexErrorwhen using the c parser.Default value of
dtypeinget_dummies()is changed toboolfromuint8(GH 45848)DataFrame.astype(),Series.astype(), andDatetimeIndex.astype()casting datetime64 data to any of “datetime64[s]”, “datetime64[ms]”, “datetime64[us]” will return an object with the given resolution instead of coercing back to “datetime64[ns]” (GH 48928)DataFrame.astype(),Series.astype(), andDatetimeIndex.astype()casting timedelta64 data to any of “timedelta64[s]”, “timedelta64[ms]”, “timedelta64[us]” will return an object with the given resolution instead of coercing to “float64” dtype (GH 48963)DatetimeIndex.astype(),TimedeltaIndex.astype(),PeriodIndex.astype()Series.astype(),DataFrame.astype()withdatetime64,timedelta64orPeriodDtypedtypes no longer allow converting to integer dtypes other than “int64”, doobj.astype('int64', copy=False).astype(dtype)instead (GH 49715)Index.astype()now allows casting fromfloat64dtype to datetime-like dtypes, matchingSeriesbehavior (GH 49660)Passing data with dtype of “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” to
TimedeltaIndex,Series, orDataFrameconstructors will now retain that dtype instead of casting to “timedelta64[ns]”; timedelta64 data with lower resolution will be cast to the lowest supported resolution “timedelta64[s]” (GH 49014)Passing
dtypeof “timedelta64[s]”, “timedelta64[ms]”, or “timedelta64[us]” toTimedeltaIndex,Series, orDataFrameconstructors will now retain that dtype instead of casting to “timedelta64[ns]”; passing a dtype with lower resolution forSeriesorDataFramewill be cast to the lowest supported resolution “timedelta64[s]” (GH 49014)Passing a
np.datetime64object with non-nanosecond resolution toTimestampwill retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH 49008)Passing
datetime64values with resolution other than nanosecond toto_datetime()will retain the input resolution if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH 50369)Passing integer values and a non-nanosecond datetime64 dtype (e.g. “datetime64[s]”)
DataFrame,Series, orIndexwill treat the values as multiples of the dtype’s unit, matching the behavior of e.g.Series(np.array(values, dtype="M8[s]"))(GH 51092)Passing a string in ISO-8601 format to
Timestampwill retain the resolution of the parsed input if it is “s”, “ms”, “us”, or “ns”; otherwise it will be cast to the closest supported resolution (GH 49737)The
otherargument inDataFrame.mask()andSeries.mask()now defaults tono_defaultinstead ofnp.nanconsistent withDataFrame.where()andSeries.where(). Entries will be filled with the corresponding NULL value (np.nanfor numpy dtypes,pd.NAfor extension dtypes). (GH 49111)Changed behavior of
Series.quantile()andDataFrame.quantile()withSparseDtypeto retain sparse dtype (GH 49583)When creating a
Serieswith a object-dtypeIndexof datetime objects, pandas no longer silently converts the index to aDatetimeIndex(GH 39307, GH 23598)pandas.testing.assert_index_equal()with parameterexact="equiv"now considers two indexes equal when both are either aRangeIndexorIndexwith anint64dtype. Previously it meant either aRangeIndexor aInt64Index(GH 51098)Series.unique()with dtype “timedelta64[ns]” or “datetime64[ns]” now returnsTimedeltaArrayorDatetimeArrayinstead ofnumpy.ndarray(GH 49176)to_datetime()andDatetimeIndexnow allow sequences containing bothdatetimeobjects and numeric entries, matchingSeriesbehavior (GH 49037, GH 50453)pandas.api.types.is_string_dtype()now only returnsTruefor array-likes withdtype=objectwhen the elements are inferred to be strings (GH 15585)Passing a sequence containing
datetimeobjects anddateobjects toSeriesconstructor will return withobjectdtype instead ofdatetime64[ns]dtype, consistent withIndexbehavior (GH 49341)Passing strings that cannot be parsed as datetimes to
SeriesorDataFramewithdtype="datetime64[ns]"will raise instead of silently ignoring the keyword and returningobjectdtype (GH 24435)Passing a sequence containing a type that cannot be converted to
Timedeltatoto_timedelta()or to theSeriesorDataFrameconstructor withdtype="timedelta64[ns]"or toTimedeltaIndexnow raisesTypeErrorinstead ofValueError(GH 49525)Changed behavior of
Indexconstructor with sequence containing at least oneNaTand everything else eitherNoneorNaNto inferdatetime64[ns]dtype instead ofobject, matchingSeriesbehavior (GH 49340)read_stata()with parameterindex_colset toNone(the default) will now set the index on the returnedDataFrameto aRangeIndexinstead of aInt64Index(GH 49745)Changed behavior of
Index,Series, andDataFramearithmetic methods when working with object-dtypes, the results no longer do type inference on the result of the array operations, useresult.infer_objects(copy=False)to do type inference on the result (GH 49999, GH 49714)Changed behavior of
Indexconstructor with an object-dtypenumpy.ndarraycontaining all-boolvalues or all-complex values, this will now retain object dtype, consistent with theSeriesbehavior (GH 49594)Changed behavior of
Series.astype()from object-dtype containingbytesobjects to string dtypes; this now doesval.decode()on bytes objects instead ofstr(val), matchingIndex.astype()behavior (GH 45326)Added
"None"to defaultna_valuesinread_csv()(GH 50286)Changed behavior of
SeriesandDataFrameconstructors when given an integer dtype and floating-point data that is not round numbers, this now raisesValueErrorinstead of silently retaining the float dtype; doSeries(data)orDataFrame(data)to get the old behavior, andSeries(data).astype(dtype)orDataFrame(data).astype(dtype)to get the specified dtype (GH 49599)Changed behavior of
DataFrame.shift()withaxis=1, an integerfill_value, and homogeneous datetime-like dtype, this now fills new columns with integer dtypes instead of casting to datetimelike (GH 49842)Files are now closed when encountering an exception in
read_json()(GH 49921)Changed behavior of
read_csv(),read_json()&read_fwf(), where the index will now always be aRangeIndex, when no index is specified. Previously the index would be aIndexwith dtypeobjectif the new DataFrame/Series has length 0 (GH 49572)DataFrame.values(),DataFrame.to_numpy(),DataFrame.xs(),DataFrame.reindex(),DataFrame.fillna(), andDataFrame.replace()no longer silently consolidate the underlying arrays; dodf = df.copy()to ensure consolidation (GH 49356)Creating a new DataFrame using a full slice on both axes with
locoriloc(thus,df.loc[:, :]ordf.iloc[:, :]) now returns a new DataFrame (shallow copy) instead of the original DataFrame, consistent with other methods to get a full slice (for exampledf.loc[:]ordf[:]) (GH 49469)The
SeriesandDataFrameconstructors will now return a shallow copy (i.e. share data, but not attributes) when passed a Series and DataFrame, respectively, and with the default ofcopy=False(and if no other keyword triggers a copy). Previously, the new Series or DataFrame would share the index attribute (e.g.df.index = ...would also update the index of the parent or child) (GH 49523)Disallow computing
cumprodforTimedeltaobject; previously this returned incorrect values (GH 50246)DataFrameobjects read from aHDFStorefile without an index now have aRangeIndexinstead of anint64index (GH 51076)Instantiating an
Indexwith an numeric numpy dtype with data containingNAand/orNaTnow raises aValueError. Previously aTypeErrorwas raised (GH 51050)Loading a JSON file with duplicate columns using
read_json(orient='split')renames columns to avoid duplicates, asread_csv()and the other readers do (GH 50370)The levels of the index of the
Seriesreturned fromSeries.sparse.from_coonow always have dtypeint32. Previously they had dtypeint64(GH 50926)to_datetime()withunitof either “Y” or “M” will now raise if a sequence contains a non-roundfloatvalue, matching theTimestampbehavior (GH 50301)The methods
Series.round(),DataFrame.__invert__(),Series.__invert__(),DataFrame.swapaxes(),DataFrame.first(),DataFrame.last(),Series.first(),Series.last()andDataFrame.align()will now always return new objects (GH 51032)DataFrameandDataFrameGroupByaggregations (e.g. “sum”) with object-dtype columns no longer infer non-object dtypes for their results, explicitly callresult.infer_objects(copy=False)on the result to obtain the old behavior (GH 51205, GH 49603)Division by zero with
ArrowDtypedtypes returns-inf,nan, orinfdepending on the numerator, instead of raising (GH 51541)Added
pandas.api.types.is_any_real_numeric_dtype()to check for real numeric dtypes (GH 51152)value_counts()now returns data withArrowDtypewithpyarrow.int64type instead of"Int64"type (GH 51462)factorize()andunique()preserve the original dtype when passed numpy timedelta64 or datetime64 with non-nanosecond resolution (GH 48670)
Note
A current PDEP proposes the deprecation and removal of the keywords inplace and copy
for all but a small subset of methods from the pandas API. The current discussion takes place
at here. The keywords won’t be necessary
anymore in the context of Copy-on-Write. If this proposal is accepted, both
keywords would be deprecated in the next release of pandas and removed in pandas 3.0.
Deprecations#
Deprecated parsing datetime strings with system-local timezone to
tzlocal, pass atzkeyword or explicitly calltz_localizeinstead (GH 50791)Deprecated argument
infer_datetime_formatinto_datetime()andread_csv(), as a strict version of it is now the default (GH 48621)Deprecated behavior of
to_datetime()withunitwhen parsing strings, in a future version these will be parsed as datetimes (matching unit-less behavior) instead of cast to floats. To retain the old behavior, cast strings to numeric types before callingto_datetime()(GH 50735)Deprecated
pandas.io.sql.execute()(GH 50185)Index.is_boolean()has been deprecated. Usepandas.api.types.is_bool_dtype()instead (GH 50042)Index.is_integer()has been deprecated. Usepandas.api.types.is_integer_dtype()instead (GH 50042)Index.is_floating()has been deprecated. Usepandas.api.types.is_float_dtype()instead (GH 50042)Index.holds_integer()has been deprecated. Usepandas.api.types.infer_dtype()instead (GH 50243)Index.is_numeric()has been deprecated. Usepandas.api.types.is_any_real_numeric_dtype()instead (GH 50042,:issue:51152)Index.is_categorical()has been deprecated. Usepandas.api.types.is_categorical_dtype()instead (GH 50042)Index.is_object()has been deprecated. Usepandas.api.types.is_object_dtype()instead (GH 50042)Index.is_interval()has been deprecated. Usepandas.api.types.is_interval_dtype()instead (GH 50042)Deprecated argument
date_parserinread_csv(),read_table(),read_fwf(), andread_excel()in favour ofdate_format(GH 50601)Deprecated
allandanyreductions withdatetime64andDatetimeTZDtypedtypes, use e.g.(obj != pd.Timestamp(0), tz=obj.tz).all()instead (GH 34479)Deprecated unused arguments
*argsand**kwargsinResampler(GH 50977)Deprecated calling
floatorinton a single elementSeriesto return afloatorintrespectively. Extract the element before callingfloatorintinstead (GH 51101)Deprecated
Grouper.groups(), useGroupby.groups()instead (GH 51182)Deprecated
Grouper.grouper(), useGroupby.grouper()instead (GH 51182)Deprecated
Grouper.obj(), useGroupby.obj()instead (GH 51206)Deprecated
Grouper.indexer(), useResampler.indexer()instead (GH 51206)Deprecated
Grouper.ax(), useResampler.ax()instead (GH 51206)Deprecated keyword
use_nullable_dtypesinread_parquet(), usedtype_backendinstead (GH 51853)Deprecated
Series.pad()in favor ofSeries.ffill()(GH 33396)Deprecated
Series.backfill()in favor ofSeries.bfill()(GH 33396)Deprecated
DataFrame.pad()in favor ofDataFrame.ffill()(GH 33396)Deprecated
DataFrame.backfill()in favor ofDataFrame.bfill()(GH 33396)Deprecated
close(). UseStataReaderas a context manager instead (GH 49228)Deprecated producing a scalar when iterating over a
DataFrameGroupByor aSeriesGroupBythat has been grouped by alevelparameter that is a list of length 1; a tuple of length one will be returned instead (GH 51583)
Removal of prior version deprecations/changes#
Removed
Int64Index,UInt64IndexandFloat64Index. See also here for more information (GH 42717)Removed deprecated
Timestamp.freq,Timestamp.freqstrand argumentfreqfrom theTimestampconstructor andTimestamp.fromordinal()(GH 14146)Removed deprecated
CategoricalBlock,Block.is_categorical(), require datetime64 and timedelta64 values to be wrapped inDatetimeArrayorTimedeltaArraybefore passing toBlock.make_block_same_class(), requireDatetimeTZBlock.valuesto have the correct ndim when passing to theBlockManagerconstructor, and removed the “fastpath” keyword from theSingleBlockManagerconstructor (GH 40226, GH 40571)Removed deprecated global option
use_inf_as_nullin favor ofuse_inf_as_na(GH 17126)Removed deprecated module
pandas.core.index(GH 30193)Removed deprecated alias
pandas.core.tools.datetimes.to_time, import the function directly frompandas.core.tools.timesinstead (GH 34145)Removed deprecated alias
pandas.io.json.json_normalize, import the function directly frompandas.json_normalizeinstead (GH 27615)Removed deprecated
Categorical.to_dense(), usenp.asarray(cat)instead (GH 32639)Removed deprecated
Categorical.take_nd()(GH 27745)Removed deprecated
Categorical.mode(), useSeries(cat).mode()instead (GH 45033)Removed deprecated
Categorical.is_dtype_equal()andCategoricalIndex.is_dtype_equal()(GH 37545)Removed deprecated
CategoricalIndex.take_nd()(GH 30702)Removed deprecated
Index.is_type_compatible()(GH 42113)Removed deprecated
Index.is_mixed(), checkindex.inferred_typedirectly instead (GH 32922)Removed deprecated
pandas.api.types.is_categorical(); usepandas.api.types.is_categorical_dtype()instead (GH 33385)Removed deprecated
Index.asi8()(GH 37877)Enforced deprecation changing behavior when passing
datetime64[ns]dtype data and timezone-aware dtype toSeries, interpreting the values as wall-times instead of UTC times, matchingDatetimeIndexbehavior (GH 41662)Enforced deprecation changing behavior when applying a numpy ufunc on multiple non-aligned (on the index or columns)
DataFramethat will now align the inputs first (GH 39239)Removed deprecated
DataFrame._AXIS_NUMBERS(),DataFrame._AXIS_NAMES(),Series._AXIS_NUMBERS(),Series._AXIS_NAMES()(GH 33637)Removed deprecated
Index.to_native_types(), useobj.astype(str)instead (GH 36418)Removed deprecated
Series.iteritems(),DataFrame.iteritems(), useobj.itemsinstead (GH 45321)Removed deprecated
DataFrame.lookup()(GH 35224)Removed deprecated
Series.append(),DataFrame.append(), useconcat()instead (GH 35407)Removed deprecated
Series.iteritems(),DataFrame.iteritems()andHDFStore.iteritems()useobj.itemsinstead (GH 45321)Removed deprecated
DatetimeIndex.union_many()(GH 45018)Removed deprecated
weekofyearandweekattributes ofDatetimeArray,DatetimeIndexanddtaccessor in favor ofisocalendar().week(GH 33595)Removed deprecated
RangeIndex._start(),RangeIndex._stop(),RangeIndex._step(), usestart,stop,stepinstead (GH 30482)Removed deprecated
DatetimeIndex.to_perioddelta(), Usedtindex - dtindex.to_period(freq).to_timestamp()instead (GH 34853)Removed deprecated
Styler.hide_index()andStyler.hide_columns()(GH 49397)Removed deprecated
Styler.set_na_rep()andStyler.set_precision()(GH 49397)Removed deprecated
Styler.where()(GH 49397)Removed deprecated
Styler.render()(GH 49397)Removed deprecated argument
col_spaceinDataFrame.to_latex()(GH 47970)Removed deprecated argument
null_colorinStyler.highlight_null()(GH 49397)Removed deprecated argument
check_less_preciseintesting.assert_frame_equal(),testing.assert_extension_array_equal(),testing.assert_series_equal(),testing.assert_index_equal()(GH 30562)Removed deprecated
null_countsargument inDataFrame.info(). Useshow_countsinstead (GH 37999)Removed deprecated
Index.is_monotonic(), andSeries.is_monotonic(); useobj.is_monotonic_increasinginstead (GH 45422)Removed deprecated
Index.is_all_dates()(GH 36697)Enforced deprecation disallowing passing a timezone-aware
Timestampanddtype="datetime64[ns]"toSeriesorDataFrameconstructors (GH 41555)Enforced deprecation disallowing passing a sequence of timezone-aware values and
dtype="datetime64[ns]"to toSeriesorDataFrameconstructors (GH 41555)Enforced deprecation disallowing
numpy.ma.mrecords.MaskedRecordsin theDataFrameconstructor; pass"{name: data[name] for name in data.dtype.names}instead (GH 40363)Enforced deprecation disallowing unit-less “datetime64” dtype in
Series.astype()andDataFrame.astype()(GH 47844)Enforced deprecation disallowing using
.astypeto convert adatetime64[ns]Series,DataFrame, orDatetimeIndexto timezone-aware dtype, useobj.tz_localizeorser.dt.tz_localizeinstead (GH 39258)Enforced deprecation disallowing using
.astypeto convert a timezone-awareSeries,DataFrame, orDatetimeIndexto timezone-naivedatetime64[ns]dtype, useobj.tz_localize(None)orobj.tz_convert("UTC").tz_localize(None)instead (GH 39258)Enforced deprecation disallowing passing non boolean argument to sort in
concat()(GH 44629)Removed Date parser functions
parse_date_time(),parse_date_fields(),parse_all_fields()andgeneric_parser()(GH 24518)Removed argument
indexfrom thecore.arrays.SparseArrayconstructor (GH 43523)Remove argument
squeezefromDataFrame.groupby()andSeries.groupby()(GH 32380)Removed deprecated
apply,apply_index,__call__,onOffset, andisAnchoredattributes fromDateOffset(GH 34171)Removed
keep_tzargument inDatetimeIndex.to_series()(GH 29731)Remove arguments
namesanddtypefromIndex.copy()andlevelsandcodesfromMultiIndex.copy()(GH 35853, GH 36685)Remove argument
inplacefromMultiIndex.set_levels()andMultiIndex.set_codes()(GH 35626)Removed arguments
verboseandencodingfromDataFrame.to_excel()andSeries.to_excel()(GH 47912)Removed argument
line_terminatorfromDataFrame.to_csv()andSeries.to_csv(), uselineterminatorinstead (GH 45302)Removed argument
inplacefromDataFrame.set_axis()andSeries.set_axis(), useobj = obj.set_axis(..., copy=False)instead (GH 48130)Disallow passing positional arguments to
MultiIndex.set_levels()andMultiIndex.set_codes()(GH 41485)Disallow parsing to Timedelta strings with components with units “Y”, “y”, or “M”, as these do not represent unambiguous durations (GH 36838)
Removed
MultiIndex.is_lexsorted()andMultiIndex.lexsort_depth()(GH 38701)Removed argument
howfromPeriodIndex.astype(), usePeriodIndex.to_timestamp()instead (GH 37982)Removed argument
try_castfromDataFrame.mask(),DataFrame.where(),Series.mask()andSeries.where()(GH 38836)Removed argument
tzfromPeriod.to_timestamp(), useobj.to_timestamp(...).tz_localize(tz)instead (GH 34522)Removed argument
sort_columnsinDataFrame.plot()andSeries.plot()(GH 47563)Removed argument
is_copyfromDataFrame.take()andSeries.take()(GH 30615)Removed argument
kindfromIndex.get_slice_bound(),Index.slice_indexer()andIndex.slice_locs()(GH 41378)Removed arguments
prefix,squeeze,error_bad_linesandwarn_bad_linesfromread_csv()(GH 40413, GH 43427)Removed arguments
squeezefromread_excel()(GH 43427)Removed argument
datetime_is_numericfromDataFrame.describe()andSeries.describe()as datetime data will always be summarized as numeric data (GH 34798)Disallow passing list
keytoSeries.xs()andDataFrame.xs(), pass a tuple instead (GH 41789)Disallow subclass-specific keywords (e.g. “freq”, “tz”, “names”, “closed”) in the
Indexconstructor (GH 38597)Removed argument
inplacefromCategorical.remove_unused_categories()(GH 37918)Disallow passing non-round floats to
Timestampwithunit="M"orunit="Y"(GH 47266)Remove keywords
convert_floatandmangle_dupe_colsfromread_excel()(GH 41176)Remove keyword
mangle_dupe_colsfromread_csv()andread_table()(GH 48137)Removed
errorskeyword fromDataFrame.where(),Series.where(),DataFrame.mask()andSeries.mask()(GH 47728)Disallow passing non-keyword arguments to
read_excel()exceptioandsheet_name(GH 34418)Disallow passing non-keyword arguments to
DataFrame.drop()andSeries.drop()exceptlabels(GH 41486)Disallow passing non-keyword arguments to
DataFrame.fillna()andSeries.fillna()exceptvalue(GH 41485)Disallow passing non-keyword arguments to
StringMethods.split()andStringMethods.rsplit()except forpat(GH 47448)Disallow passing non-keyword arguments to
DataFrame.set_index()exceptkeys(GH 41495)Disallow passing non-keyword arguments to
Resampler.interpolate()exceptmethod(GH 41699)Disallow passing non-keyword arguments to
DataFrame.reset_index()andSeries.reset_index()exceptlevel(GH 41496)Disallow passing non-keyword arguments to
DataFrame.dropna()andSeries.dropna()(GH 41504)Disallow passing non-keyword arguments to
ExtensionArray.argsort()(GH 46134)Disallow passing non-keyword arguments to
Categorical.sort_values()(GH 47618)Disallow passing non-keyword arguments to
Index.drop_duplicates()andSeries.drop_duplicates()(GH 41485)Disallow passing non-keyword arguments to
DataFrame.drop_duplicates()except forsubset(GH 41485)Disallow passing non-keyword arguments to
DataFrame.sort_index()andSeries.sort_index()(GH 41506)Disallow passing non-keyword arguments to
DataFrame.interpolate()andSeries.interpolate()except formethod(GH 41510)Disallow passing non-keyword arguments to
DataFrame.any()andSeries.any()(GH 44896)Disallow passing non-keyword arguments to
Index.set_names()except fornames(GH 41551)Disallow passing non-keyword arguments to
Index.join()except forother(GH 46518)Disallow passing non-keyword arguments to
concat()except forobjs(GH 41485)Disallow passing non-keyword arguments to
pivot()except fordata(GH 48301)Disallow passing non-keyword arguments to
DataFrame.pivot()(GH 48301)Disallow passing non-keyword arguments to
read_html()except forio(GH 27573)Disallow passing non-keyword arguments to
read_json()except forpath_or_buf(GH 27573)Disallow passing non-keyword arguments to
read_sas()except forfilepath_or_buffer(GH 47154)Disallow passing non-keyword arguments to
read_stata()except forfilepath_or_buffer(GH 48128)Disallow passing non-keyword arguments to
read_csv()exceptfilepath_or_buffer(GH 41485)Disallow passing non-keyword arguments to
read_table()exceptfilepath_or_buffer(GH 41485)Disallow passing non-keyword arguments to
read_fwf()exceptfilepath_or_buffer(GH 44710)Disallow passing non-keyword arguments to
read_xml()except forpath_or_buffer(GH 45133)Disallow passing non-keyword arguments to
Series.mask()andDataFrame.mask()exceptcondandother(GH 41580)Disallow passing non-keyword arguments to
DataFrame.to_stata()except forpath(GH 48128)Disallow passing non-keyword arguments to
DataFrame.where()andSeries.where()except forcondandother(GH 41523)Disallow passing non-keyword arguments to
Series.set_axis()andDataFrame.set_axis()except forlabels(GH 41491)Disallow passing non-keyword arguments to
Series.rename_axis()andDataFrame.rename_axis()except formapper(GH 47587)Disallow passing non-keyword arguments to
Series.clip()andDataFrame.clip()exceptlowerandupper(GH 41511)Disallow passing non-keyword arguments to
Series.bfill(),Series.ffill(),DataFrame.bfill()andDataFrame.ffill()(GH 41508)Disallow passing non-keyword arguments to
DataFrame.replace(),Series.replace()except forto_replaceandvalue(GH 47587)Disallow passing non-keyword arguments to
DataFrame.sort_values()except forby(GH 41505)Disallow passing non-keyword arguments to
Series.sort_values()(GH 41505)Disallow passing non-keyword arguments to
DataFrame.reindex()except forlabels(GH 17966)Disallow
Index.reindex()with non-uniqueIndexobjects (GH 42568)Disallowed constructing
Categoricalwith scalardata(GH 38433)Disallowed constructing
CategoricalIndexwithout passingdata(GH 38944)Removed
Rolling.validate(),Expanding.validate(), andExponentialMovingWindow.validate()(GH 43665)Removed
Rolling.win_typereturning"freq"(GH 38963)Removed
Rolling.is_datetimelike(GH 38963)Removed the
levelkeyword inDataFrameandSeriesaggregations; usegroupbyinstead (GH 39983)Removed deprecated
Timedelta.delta(),Timedelta.is_populated(), andTimedelta.freq(GH 46430, GH 46476)Removed deprecated
NaT.freq(GH 45071)Removed deprecated
Categorical.replace(), useSeries.replace()instead (GH 44929)Removed the
numeric_onlykeyword fromCategorical.min()andCategorical.max()in favor ofskipna(GH 48821)Changed behavior of
DataFrame.median()andDataFrame.mean()withnumeric_only=Noneto not exclude datetime-like columns THIS NOTE WILL BE IRRELEVANT ONCEnumeric_only=NoneDEPRECATION IS ENFORCED (GH 29941)Removed
is_extension_type()in favor ofis_extension_array_dtype()(GH 29457)Removed
.ExponentialMovingWindow.vol(GH 39220)Removed
Index.get_value()andIndex.set_value()(GH 33907, GH 28621)Removed
Series.slice_shift()andDataFrame.slice_shift()(GH 37601)Remove
DataFrameGroupBy.pad()andDataFrameGroupBy.backfill()(GH 45076)Remove
numpyargument fromread_json()(GH 30636)Disallow passing abbreviations for
orientinDataFrame.to_dict()(GH 32516)Disallow partial slicing on an non-monotonic
DatetimeIndexwith keys which are not in Index. This now raises aKeyError(GH 18531)Removed
get_offsetin favor ofto_offset()(GH 30340)Removed the
warnkeyword ininfer_freq()(GH 45947)Removed the
include_startandinclude_endarguments inDataFrame.between_time()in favor ofinclusive(GH 43248)Removed the
closedargument indate_range()andbdate_range()in favor ofinclusiveargument (GH 40245)Removed the
centerkeyword inDataFrame.expanding()(GH 20647)Removed the
methodandtolerancearguments inIndex.get_loc(). Useindex.get_indexer([label], method=..., tolerance=...)instead (GH 42269)Removed the
pandas.datetimesubmodule (GH 30489)Removed the
pandas.npsubmodule (GH 30296)Removed
pandas.util.testingin favor ofpandas.testing(GH 30745)Removed
Series.str.__iter__()(GH 28277)Removed
pandas.SparseArrayin favor ofarrays.SparseArray(GH 30642)Removed
pandas.SparseSeriesandpandas.SparseDataFrame, including pickle support. (GH 30642)Enforced disallowing passing an integer
fill_valuetoDataFrame.shift()andSeries.shift`()with datetime64, timedelta64, or period dtypes (GH 32591)Enforced disallowing a string column label into
timesinDataFrame.ewm()(GH 43265)Enforced disallowing passing
TrueandFalseintoinclusiveinSeries.between()in favor of"both"and"neither"respectively (GH 40628)Enforced disallowing using
usecolswith out of bounds indices forread_csvwithengine="c"(GH 25623)Enforced disallowing the use of
**kwargsinExcelWriter; use the keyword argumentengine_kwargsinstead (GH 40430)Enforced disallowing a tuple of column labels into
DataFrameGroupBy.__getitem__()(GH 30546)Enforced disallowing missing labels when indexing with a sequence of labels on a level of a
MultiIndex. This now raises aKeyError(GH 42351)Enforced disallowing setting values with
.locusing a positional slice. Use.locwith labels or.ilocwith positions instead (GH 31840)Enforced disallowing positional indexing with a
floatkey even if that key is a round number, manually cast to integer instead (GH 34193)Enforced disallowing using a
DataFrameindexer with.iloc, use.locinstead for automatic alignment (GH 39022)Enforced disallowing
setordictindexers in__getitem__and__setitem__methods (GH 42825)Enforced disallowing indexing on a
Indexor positional indexing on aSeriesproducing multi-dimensional objects e.g.obj[:, None], convert to numpy before indexing instead (GH 35141)Enforced disallowing
dictorsetobjects insuffixesinmerge()(GH 34810)Enforced disallowing
merge()to produce duplicated columns through thesuffixeskeyword and already existing columns (GH 22818)Enforced disallowing using
merge()orjoin()on a different number of levels (GH 34862)Enforced disallowing
value_nameargument inDataFrame.melt()to match an element in theDataFramecolumns (GH 35003)Enforced disallowing passing
showindexinto**kwargsinDataFrame.to_markdown()andSeries.to_markdown()in favor ofindex(GH 33091)Removed setting Categorical._codes directly (GH 41429)
Removed setting Categorical.categories directly (GH 47834)
Removed argument
inplacefromCategorical.add_categories(),Categorical.remove_categories(),Categorical.set_categories(),Categorical.rename_categories(),Categorical.reorder_categories(),Categorical.set_ordered(),Categorical.as_ordered(),Categorical.as_unordered()(GH 37981, GH 41118, GH 41133, GH 47834)Enforced
Rolling.count()withmin_periods=Noneto default to the size of the window (GH 31302)Renamed
fnametopathinDataFrame.to_parquet(),DataFrame.to_stata()andDataFrame.to_feather()(GH 30338)Enforced disallowing indexing a
Serieswith a single item list with a slice (e.g.ser[[slice(0, 2)]]). Either convert the list to tuple, or pass the slice directly instead (GH 31333)Changed behavior indexing on a
DataFramewith aDatetimeIndexindex using a string indexer, previously this operated as a slice on rows, now it operates like any other column key; useframe.loc[key]for the old behavior (GH 36179)Enforced the
display.max_colwidthoption to not accept negative integers (GH 31569)Removed the
display.column_spaceoption in favor ofdf.to_string(col_space=...)(GH 47280)Removed the deprecated method
madfrom pandas classes (GH 11787)Removed the deprecated method
tshiftfrom pandas classes (GH 11631)Changed behavior of empty data passed into
Series; the default dtype will beobjectinstead offloat64(GH 29405)Changed the behavior of
DatetimeIndex.union(),DatetimeIndex.intersection(), andDatetimeIndex.symmetric_difference()with mismatched timezones to convert to UTC instead of casting to object dtype (GH 39328)Changed the behavior of
to_datetime()with argument “now” withutc=Falseto matchTimestamp("now")(GH 18705)Changed the behavior of indexing on a timezone-aware
DatetimeIndexwith a timezone-naivedatetimeobject or vice-versa; these now behave like any other non-comparable type by raisingKeyError(GH 36148)Changed the behavior of
Index.reindex(),Series.reindex(), andDataFrame.reindex()with adatetime64dtype and adatetime.dateobject forfill_value; these are no longer considered equivalent todatetime.datetimeobjects so the reindex casts to object dtype (GH 39767)Changed behavior of
SparseArray.astype()when given a dtype that is not explicitlySparseDtype, cast to the exact requested dtype rather than silently using aSparseDtypeinstead (GH 34457)Changed behavior of
Index.ravel()to return a view on the originalIndexinstead of anp.ndarray(GH 36900)Changed behavior of
Series.to_frame()andIndex.to_frame()with explicitname=Noneto useNonefor the column name instead of the index’s name or default0(GH 45523)Changed behavior of
concat()with one array ofbool-dtype and another of integer dtype, this now returnsobjectdtype instead of integer dtype; explicitly cast the bool object to integer before concatenating to get the old behavior (GH 45101)Changed behavior of
DataFrameconstructor given floating-pointdataand an integerdtype, when the data cannot be cast losslessly, the floating point dtype is retained, matchingSeriesbehavior (GH 41170)Changed behavior of
Indexconstructor when given anp.ndarraywith object-dtype containing numeric entries; this now retains object dtype rather than inferring a numeric dtype, consistent withSeriesbehavior (GH 42870)Changed behavior of
Index.__and__(),Index.__or__()andIndex.__xor__()to behave as logical operations (matchingSeriesbehavior) instead of aliases for set operations (GH 37374)Changed behavior of
DataFrameconstructor when passed a list whose first element is aCategorical, this now treats the elements as rows casting toobjectdtype, consistent with behavior for other types (GH 38845)Changed behavior of
DataFrameconstructor when passed adtype(other than int) that the data cannot be cast to; it now raises instead of silently ignoring the dtype (GH 41733)Changed the behavior of
Seriesconstructor, it will no longer infer a datetime64 or timedelta64 dtype from string entries (GH 41731)Changed behavior of
Timestampconstructor with anp.datetime64object and atzpassed to interpret the input as a wall-time as opposed to a UTC time (GH 42288)Changed behavior of
Timestamp.utcfromtimestamp()to return a timezone-aware object satisfyingTimestamp.utcfromtimestamp(val).timestamp() == val(GH 45083)Changed behavior of
Indexconstructor when passed aSparseArrayorSparseDtypeto retain that dtype instead of casting tonumpy.ndarray(GH 43930)Changed behavior of setitem-like operations (
__setitem__,fillna,where,mask,replace,insert, fill_value forshift) on an object withDatetimeTZDtypewhen using a value with a non-matching timezone, the value will be cast to the object’s timezone instead of casting both to object-dtype (GH 44243)Changed behavior of
Index,Series,DataFrameconstructors with floating-dtype data and aDatetimeTZDtype, the data are now interpreted as UTC-times instead of wall-times, consistent with how integer-dtype data are treated (GH 45573)Changed behavior of
SeriesandDataFrameconstructors with integer dtype and floating-point data containingNaN, this now raisesIntCastingNaNError(GH 40110)Changed behavior of
SeriesandDataFrameconstructors with an integerdtypeand values that are too large to losslessly cast to this dtype, this now raisesValueError(GH 41734)Changed behavior of
SeriesandDataFrameconstructors with an integerdtypeand values having eitherdatetime64ortimedelta64dtypes, this now raisesTypeError, usevalues.view("int64")instead (GH 41770)Removed the deprecated
baseandloffsetarguments frompandas.DataFrame.resample(),pandas.Series.resample()andpandas.Grouper. Useoffsetororigininstead (GH 31809)Changed behavior of
Series.fillna()andDataFrame.fillna()withtimedelta64[ns]dtype and an incompatiblefill_value; this now casts toobjectdtype instead of raising, consistent with the behavior with other dtypes (GH 45746)Change the default argument of
regexforSeries.str.replace()fromTruetoFalse. Additionally, a single characterpatwithregex=Trueis now treated as a regular expression instead of a string literal. (GH 36695, GH 24804)Changed behavior of
DataFrame.any()andDataFrame.all()withbool_only=True; object-dtype columns with all-bool values will no longer be included, manually cast tobooldtype first (GH 46188)Changed behavior of
DataFrame.max(),DataFrame.min,DataFrame.mean,DataFrame.median,DataFrame.skew,DataFrame.kurtwithaxis=Noneto return a scalar applying the aggregation across both axes (GH 45072)Changed behavior of comparison of a
Timestampwith adatetime.dateobject; these now compare as un-equal and raise on inequality comparisons, matching thedatetime.datetimebehavior (GH 36131)Changed behavior of comparison of
NaTwith adatetime.dateobject; these now raise on inequality comparisons (GH 39196)Enforced deprecation of silently dropping columns that raised a
TypeErrorinSeries.transformandDataFrame.transformwhen used with a list or dictionary (GH 43740)Changed behavior of
DataFrame.apply()with list-like so that any partial failure will raise an error (GH 43740)Changed behaviour of
DataFrame.to_latex()to now use the Styler implementation viaStyler.to_latex()(GH 47970)Changed behavior of
Series.__setitem__()with an integer key and aFloat64Indexwhen the key is not present in the index; previously we treated the key as positional (behaving likeseries.iloc[key] = val), now we treat it is a label (behaving likeseries.loc[key] = val), consistent withSeries.__getitem__`()behavior (GH 33469)Removed
na_sentinelargument fromfactorize(),Index.factorize(), andExtensionArray.factorize()(GH 47157)Changed behavior of
Series.diff()andDataFrame.diff()withExtensionDtypedtypes whose arrays do not implementdiff, these now raiseTypeErrorrather than casting to numpy (GH 31025)Enforced deprecation of calling numpy “ufunc”s on
DataFramewithmethod="outer"; this now raisesNotImplementedError(GH 36955)Enforced deprecation disallowing passing
numeric_only=TruetoSeriesreductions (rank,any,all, …) with non-numeric dtype (GH 47500)Changed behavior of
DataFrameGroupBy.apply()andSeriesGroupBy.apply()so thatgroup_keysis respected even if a transformer is detected (GH 34998)Comparisons between a
DataFrameand aSerieswhere the frame’s columns do not match the series’s index raiseValueErrorinstead of automatically aligning, doleft, right = left.align(right, axis=1, copy=False)before comparing (GH 36795)Enforced deprecation
numeric_only=None(the default) in DataFrame reductions that would silently drop columns that raised;numeric_onlynow defaults toFalse(GH 41480)Changed default of
numeric_onlytoFalsein all DataFrame methods with that argument (GH 46096, GH 46906)Changed default of
numeric_onlytoFalseinSeries.rank()(GH 47561)Enforced deprecation of silently dropping nuisance columns in groupby and resample operations when
numeric_only=False(GH 41475)Enforced deprecation of silently dropping nuisance columns in
Rolling,Expanding, andExponentialMovingWindowops. This will now raise aerrors.DataError(GH 42834)Changed behavior in setting values with
df.loc[:, foo] = barordf.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH 45333)Changed default of
numeric_onlyin variousDataFrameGroupBymethods; all methods now default tonumeric_only=False(GH 46072)Changed default of
numeric_onlytoFalseinResamplermethods (GH 47177)Using the method
DataFrameGroupBy.transform()with a callable that returns DataFrames will align to the input’s index (GH 47244)When providing a list of columns of length one to
DataFrame.groupby(), the keys that are returned by iterating over the resultingDataFrameGroupByobject will now be tuples of length one (GH 47761)Removed deprecated methods
ExcelWriter.write_cells(),ExcelWriter.save(),ExcelWriter.cur_sheet(),ExcelWriter.handles(),ExcelWriter.path()(GH 45795)The
ExcelWriterattributebookcan no longer be set; it is still available to be accessed and mutated (GH 48943)Removed unused
*argsand**kwargsinRolling,Expanding, andExponentialMovingWindowops (GH 47851)Removed the deprecated argument
line_terminatorfromDataFrame.to_csv()(GH 45302)Removed the deprecated argument
labelfromlreshape()(GH 30219)Arguments after
exprinDataFrame.eval()andDataFrame.query()are keyword-only (GH 47587)Removed
Index._get_attributes_dict()(GH 50648)Removed
Series.__array_wrap__()(GH 50648)Changed behavior of
DataFrame.value_counts()to return aSerieswithMultiIndexfor any list-like(one element or not) but anIndexfor a single label (GH 50829)
Performance improvements#
Performance improvement in
DataFrameGroupBy.median()andSeriesGroupBy.median()andDataFrameGroupBy.cumprod()for nullable dtypes (GH 37493)Performance improvement in
DataFrameGroupBy.all(),DataFrameGroupBy.any(),SeriesGroupBy.all(), andSeriesGroupBy.any()for object dtype (GH 50623)Performance improvement in
MultiIndex.argsort()andMultiIndex.sort_values()(GH 48406)Performance improvement in
MultiIndex.size()(GH 48723)Performance improvement in
MultiIndex.union()without missing values and without duplicates (GH 48505, GH 48752)Performance improvement in
MultiIndex.difference()(GH 48606)Performance improvement in
MultiIndexset operations with sort=None (GH 49010)Performance improvement in
DataFrameGroupBy.mean(),SeriesGroupBy.mean(),DataFrameGroupBy.var(), andSeriesGroupBy.var()for extension array dtypes (GH 37493)Performance improvement in
MultiIndex.isin()whenlevel=None(GH 48622, GH 49577)Performance improvement in
MultiIndex.putmask()(GH 49830)Performance improvement in
Index.union()andMultiIndex.union()when index contains duplicates (GH 48900)Performance improvement in
Series.rank()for pyarrow-backed dtypes (GH 50264)Performance improvement in
Series.searchsorted()for pyarrow-backed dtypes (GH 50447)Performance improvement in
Series.fillna()for extension array dtypes (GH 49722, GH 50078)Performance improvement in
Index.join(),Index.intersection()andIndex.union()for masked and arrow dtypes whenIndexis monotonic (GH 50310, GH 51365)Performance improvement for
Series.value_counts()with nullable dtype (GH 48338)Performance improvement for
Seriesconstructor passing integer numpy array with nullable dtype (GH 48338)Performance improvement for
DatetimeIndexconstructor passing a list (GH 48609)Performance improvement in
merge()andDataFrame.join()when joining on a sortedMultiIndex(GH 48504)Performance improvement in
to_datetime()when parsing strings with timezone offsets (GH 50107)Performance improvement in
DataFrame.loc()andSeries.loc()for tuple-based indexing of aMultiIndex(GH 48384)Performance improvement for
Series.replace()with categorical dtype (GH 49404)Performance improvement for
MultiIndex.unique()(GH 48335)Performance improvement for indexing operations with nullable and arrow dtypes (GH 49420, GH 51316)
Performance improvement for
concat()with extension array backed indexes (GH 49128, GH 49178)Performance improvement for
api.types.infer_dtype()(GH 51054)Reduce memory usage of
DataFrame.to_pickle()/Series.to_pickle()when using BZ2 or LZMA (GH 49068)Performance improvement for
StringArrayconstructor passing a numpy array with typenp.str_(GH 49109)Performance improvement in
from_tuples()(GH 50620)Performance improvement in
factorize()(GH 49177)Performance improvement in
__setitem__()(GH 50248, GH 50632)Performance improvement in
ArrowExtensionArraycomparison methods when array contains NA (GH 50524)Performance improvement when parsing strings to
BooleanDtype(GH 50613)Performance improvement in
DataFrame.join()when joining on a subset of aMultiIndex(GH 48611)Performance improvement for
MultiIndex.intersection()(GH 48604)Performance improvement in
DataFrame.__setitem__()(GH 46267)Performance improvement in
varandstdfor nullable dtypes (GH 48379).Performance improvement when iterating over pyarrow and nullable dtypes (GH 49825, GH 49851)
Performance improvements to
read_sas()(GH 47403, GH 47405, GH 47656, GH 48502)Memory improvement in
RangeIndex.sort_values()(GH 48801)Performance improvement in
Series.to_numpy()ifcopy=Trueby avoiding copying twice (GH 24345)Performance improvement in
Series.rename()withMultiIndex(GH 21055)Performance improvement in
DataFrameGroupByandSeriesGroupBywhenbyis a categorical type andsort=False(GH 48976)Performance improvement in
DataFrameGroupByandSeriesGroupBywhenbyis a categorical type andobserved=False(GH 49596)Performance improvement in
read_stata()with parameterindex_colset toNone(the default). Now the index will be aRangeIndexinstead ofInt64Index(GH 49745)Performance improvement in
merge()when not merging on the index - the new index will now beRangeIndexinstead ofInt64Index(GH 49478)Performance improvement in
DataFrame.to_dict()andSeries.to_dict()when using any non-object dtypes (GH 46470)Performance improvement in
read_html()when there are multiple tables (GH 49929)Performance improvement in
Periodconstructor when constructing from a string or integer (GH 38312)Performance improvement in
to_datetime()when using'%Y%m%d'format (GH 17410)Performance improvement in
to_datetime()when format is given or can be inferred (GH 50465)Performance improvement in
Series.median()for nullable dtypes (GH 50838)Performance improvement in
read_csv()when passingto_datetime()lambda-function todate_parserand inputs have mixed timezone offsetes (GH 35296)Performance improvement in
SeriesGroupBy.value_counts()with categorical dtype (GH 46202)Fixed a reference leak in
read_hdf()(GH 37441)Fixed a memory leak in
DataFrame.to_json()andSeries.to_json()when serializing datetimes and timedeltas (GH 40443)Decreased memory usage in many
DataFrameGroupBymethods (GH 51090)Performance improvement in
DataFrame.round()for an integerdecimalparameter (GH 17254)Performance improvement in
DataFrame.replace()andSeries.replace()when using a large dict forto_replace(GH 6697)Memory improvement in
StataReaderwhen reading seekable files (GH 48922)
Bug fixes#
Categorical#
Bug in
Categorical.set_categories()losing dtype information (GH 48812)Bug in
Series.replace()with categorical dtype whento_replacevalues overlap with new values (GH 49404)Bug in
Series.replace()with categorical dtype losing nullable dtypes of underlying categories (GH 49404)Bug in
DataFrame.groupby()andSeries.groupby()would reorder categories when used as a grouper (GH 48749)Bug in
Categoricalconstructor when constructing from aCategoricalobject anddtype="category"losing ordered-ness (GH 49309)Bug in
SeriesGroupBy.min(),SeriesGroupBy.max(),DataFrameGroupBy.min(), andDataFrameGroupBy.max()with unorderedCategoricalDtypewith no groups failing to raiseTypeError(GH 51034)
Datetimelike#
Bug in
pandas.infer_freq(), raisingTypeErrorwhen inferred onRangeIndex(GH 47084)Bug in
to_datetime()incorrectly raisingOverflowErrorwith string arguments corresponding to large integers (GH 50533)Bug in
to_datetime()was raising on invalid offsets witherrors='coerce'andinfer_datetime_format=True(GH 48633)Bug in
DatetimeIndexconstructor failing to raise whentz=Noneis explicitly specified in conjunction with timezone-awaredtypeor data (GH 48659)Bug in subtracting a
datetimescalar fromDatetimeIndexfailing to retain the originalfreqattribute (GH 48818)Bug in
pandas.tseries.holiday.Holidaywhere a half-open date interval causes inconsistent return types fromUSFederalHolidayCalendar.holidays()(GH 49075)Bug in rendering
DatetimeIndexandSeriesandDataFramewith timezone-aware dtypes withdateutilorzoneinfotimezones near daylight-savings transitions (GH 49684)Bug in
to_datetime()was raisingValueErrorwhen parsingTimestamp,datetime.datetime,datetime.date, ornp.datetime64objects when non-ISO8601formatwas passed (GH 49298, GH 50036)Bug in
to_datetime()was raisingValueErrorwhen parsing empty string and non-ISO8601 format was passed. Now, empty strings will be parsed asNaT, for compatibility with how is done for ISO8601 formats (GH 50251)Bug in
Timestampwas showingUserWarning, which was not actionable by users, when parsing non-ISO8601 delimited date strings (GH 50232)Bug in
to_datetime()was showing misleadingValueErrorwhen parsing dates with format containing ISO week directive and ISO weekday directive (GH 50308)Bug in
Timestamp.round()when thefreqargument has zero-duration (e.g. “0ns”) returning incorrect results instead of raising (GH 49737)Bug in
to_datetime()was not raisingValueErrorwhen invalid format was passed anderrorswas'ignore'or'coerce'(GH 50266)Bug in
DateOffsetwas throwingTypeErrorwhen constructing with milliseconds and another super-daily argument (GH 49897)Bug in
to_datetime()was not raisingValueErrorwhen parsing string with decimal date with format'%Y%m%d'(GH 50051)Bug in
to_datetime()was not convertingNonetoNaTwhen parsing mixed-offset date strings with ISO8601 format (GH 50071)Bug in
to_datetime()was not returning input when parsing out-of-bounds date string witherrors='ignore'andformat='%Y%m%d'(GH 14487)Bug in
to_datetime()was converting timezone-naivedatetime.datetimeto timezone-aware when parsing with timezone-aware strings, ISO8601 format, andutc=False(GH 50254)Bug in
to_datetime()was throwingValueErrorwhen parsing dates with ISO8601 format where some values were not zero-padded (GH 21422)Bug in
to_datetime()was giving incorrect results when usingformat='%Y%m%d'anderrors='ignore'(GH 26493)Bug in
to_datetime()was failing to parse date strings'today'and'now'ifformatwas not ISO8601 (GH 50359)Bug in
Timestamp.utctimetuple()raising aTypeError(GH 32174)Bug in
to_datetime()was raisingValueErrorwhen parsing mixed-offsetTimestampwitherrors='ignore'(GH 50585)Bug in
to_datetime()was incorrectly handling floating-point inputs within 1unitof the overflow boundaries (GH 50183)Bug in
to_datetime()with unit of “Y” or “M” giving incorrect results, not matching pointwiseTimestampresults (GH 50870)Bug in
Series.interpolate()andDataFrame.interpolate()with datetime or timedelta dtypes incorrectly raisingValueError(GH 11312)Bug in
to_datetime()was not returning input witherrors='ignore'when input was out-of-bounds (GH 50587)Bug in
DataFrame.from_records()when given aDataFrameinput with timezone-aware datetime64 columns incorrectly dropping the timezone-awareness (GH 51162)Bug in
to_datetime()was raisingdecimal.InvalidOperationwhen parsing date strings witherrors='coerce'(GH 51084)Bug in
to_datetime()with bothunitandoriginspecified returning incorrect results (GH 42624)Bug in
Series.astype()andDataFrame.astype()when converting an object-dtype object containing timezone-aware datetimes or strings todatetime64[ns]incorrectly localizing as UTC instead of raisingTypeError(GH 50140)Bug in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()with datetime or timedelta dtypes giving incorrect results for groups containingNaT(GH 51373)Bug in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()incorrectly raising withPeriodDtypeorDatetimeTZDtype(GH 51373)
Timedelta#
Bug in
to_timedelta()raising error when input has nullable dtypeFloat64(GH 48796)Bug in
Timedeltaconstructor incorrectly raising instead of returningNaTwhen given anp.timedelta64("nat")(GH 48898)Bug in
Timedeltaconstructor failing to raise when passed both aTimedeltaobject and keywords (e.g. days, seconds) (GH 48898)Bug in
Timedeltacomparisons with very largedatetime.timedeltaobjects incorrect raisingOutOfBoundsTimedelta(GH 49021)
Timezones#
Bug in
Series.astype()andDataFrame.astype()with object-dtype containing multiple timezone-awaredatetimeobjects with heterogeneous timezones to aDatetimeTZDtypeincorrectly raising (GH 32581)Bug in
to_datetime()was failing to parse date strings with timezone name whenformatwas specified with%Z(GH 49748)Better error message when passing invalid values to
ambiguousparameter inTimestamp.tz_localize()(GH 49565)Bug in string parsing incorrectly allowing a
Timestampto be constructed with an invalid timezone, which would raise when trying to print (GH 50668)Corrected TypeError message in
objects_to_datetime64ns()to inform that DatetimeIndex has mixed timezones (GH 50974)
Numeric#
Bug in
DataFrame.add()cannot apply ufunc when inputs contain mixed DataFrame type and Series type (GH 39853)Bug in arithmetic operations on
Seriesnot propagating mask when combining masked dtypes and numpy dtypes (GH 45810, GH 42630)Bug in
DataFrame.sem()andSeries.sem()where an erroneousTypeErrorwould always raise when using data backed by anArrowDtype(GH 49759)Bug in
Series.__add__()casting to object for list and maskedSeries(GH 22962)Bug in
mode()wheredropna=Falsewas not respected when there wasNAvalues (GH 50982)Bug in
DataFrame.query()withengine="numexpr"and column names areminormaxwould raise aTypeError(GH 50937)Bug in
DataFrame.min()andDataFrame.max()with tz-aware data containingpd.NaTandaxis=1would return incorrect results (GH 51242)
Conversion#
Bug in constructing
Serieswithint64dtype from a string list raising instead of casting (GH 44923)Bug in constructing
Serieswith masked dtype and boolean values withNAraising (GH 42137)Bug in
DataFrame.eval()incorrectly raising anAttributeErrorwhen there are negative values in function call (GH 46471)Bug in
Series.convert_dtypes()not converting dtype to nullable dtype whenSeriescontainsNAand has dtypeobject(GH 48791)Bug where any
ExtensionDtypesubclass withkind="M"would be interpreted as a timezone type (GH 34986)Bug in
arrays.ArrowExtensionArraythat would raiseNotImplementedErrorwhen passed a sequence of strings or binary (GH 49172)Bug in
Series.astype()raisingpyarrow.ArrowInvalidwhen converting from a non-pyarrow string dtype to a pyarrow numeric type (GH 50430)Bug in
DataFrame.astype()modifying input array inplace when converting tostringandcopy=False(GH 51073)Bug in
Series.to_numpy()converting to NumPy array before applyingna_value(GH 48951)Bug in
DataFrame.astype()not copying data when converting to pyarrow dtype (GH 50984)Bug in
to_datetime()was not respectingexactargument whenformatwas an ISO8601 format (GH 12649)Bug in
TimedeltaArray.astype()raisingTypeErrorwhen converting to a pyarrow duration type (GH 49795)Bug in
DataFrame.eval()andDataFrame.query()raising for extension array dtypes (GH 29618, GH 50261, GH 31913)Bug in
Series()not copying data when created fromIndexanddtypeis equal todtypefromIndex(GH 52008)
Strings#
Bug in
pandas.api.types.is_string_dtype()that would not returnTrueforStringDtypeorArrowDtypewithpyarrow.string()(GH 15585)Bug in converting string dtypes to “datetime64[ns]” or “timedelta64[ns]” incorrectly raising
TypeError(GH 36153)Bug in setting values in a string-dtype column with an array, mutating the array as side effect when it contains missing values (GH 51299)
Interval#
Bug in
IntervalIndex.is_overlapping()incorrect output if interval has duplicate left boundaries (GH 49581)Bug in
Series.infer_objects()failing to inferIntervalDtypefor an object series ofIntervalobjects (GH 50090)Bug in
Series.shift()withIntervalDtypeand invalid nullfill_valuefailing to raiseTypeError(GH 51258)
Indexing#
Bug in
DataFrame.__setitem__()raising when indexer is aDataFramewithbooleandtype (GH 47125)Bug in
DataFrame.reindex()filling with wrong values when indexing columns and index foruintdtypes (GH 48184)Bug in
DataFrame.loc()when settingDataFramewith different dtypes coercing values to single dtype (GH 50467)Bug in
DataFrame.sort_values()whereNonewas not returned whenbyis empty list andinplace=True(GH 50643)Bug in
DataFrame.loc()coercing dtypes when setting values with a list indexer (GH 49159)Bug in
Series.loc()raising error for out of bounds end of slice indexer (GH 50161)Bug in
DataFrame.loc()raisingValueErrorwith allFalseboolindexer and empty object (GH 51450)Bug in
DataFrame.loc()raisingValueErrorwithboolindexer andMultiIndex(GH 47687)Bug in
DataFrame.loc()raisingIndexErrorwhen setting values for a pyarrow-backed column with a non-scalar indexer (GH 50085)Bug in
DataFrame.__getitem__(),Series.__getitem__(),DataFrame.__setitem__()andSeries.__setitem__()when indexing on indexes with extension float dtypes (Float64&Float64) or complex dtypes using integers (GH 51053)Bug in
DataFrame.loc()modifying object when setting incompatible value with an empty indexer (GH 45981)Bug in
DataFrame.__setitem__()raisingValueErrorwhen right hand side isDataFramewithMultiIndexcolumns (GH 49121)Bug in
DataFrame.reindex()casting dtype toobjectwhenDataFramehas single extension array column when re-indexingcolumnsandindex(GH 48190)Bug in
DataFrame.iloc()raisingIndexErrorwhen indexer is aSerieswith numeric extension array dtype (GH 49521)Bug in
describe()when formatting percentiles in the resulting index showed more decimals than needed (GH 46362)Bug in
DataFrame.compare()does not recognize differences when comparingNAwith value in nullable dtypes (GH 48939)Bug in
Series.rename()withMultiIndexlosing extension array dtypes (GH 21055)Bug in
DataFrame.isetitem()coercing extension array dtypes inDataFrameto object (GH 49922)Bug in
Series.__getitem__()returning corrupt object when selecting from an empty pyarrow backed object (GH 51734)Bug in
BusinessHourwould cause creation ofDatetimeIndexto fail when no opening hour was included in the index (GH 49835)
Missing#
Bug in
Index.equals()raisingTypeErrorwhenIndexconsists of tuples that containNA(GH 48446)Bug in
Series.map()caused incorrect result when data has NaNs and defaultdict mapping was used (GH 48813)Bug in
NAraising aTypeErrorinstead of returnNAwhen performing a binary operation with abytesobject (GH 49108)Bug in
DataFrame.update()withoverwrite=FalseraisingTypeErrorwhenselfhas column withNaTvalues and column not present inother(GH 16713)Bug in
Series.replace()raisingRecursionErrorwhen replacing value in object-dtypeSeriescontainingNA(GH 47480)Bug in
Series.replace()raisingRecursionErrorwhen replacing value in numericSerieswithNA(GH 50758)
MultiIndex#
Bug in
MultiIndex.get_indexer()not matchingNaNvalues (GH 29252, GH 37222, GH 38623, GH 42883, GH 43222, GH 46173, GH 48905)Bug in
MultiIndex.argsort()raisingTypeErrorwhen index containsNA(GH 48495)Bug in
MultiIndex.difference()losing extension array dtype (GH 48606)Bug in
MultiIndex.set_levelsraisingIndexErrorwhen setting empty level (GH 48636)Bug in
MultiIndex.unique()losing extension array dtype (GH 48335)Bug in
MultiIndex.intersection()losing extension array (GH 48604)Bug in
MultiIndex.union()losing extension array (GH 48498, GH 48505, GH 48900)Bug in
MultiIndex.union()not sorting when sort=None and index contains missing values (GH 49010)Bug in
MultiIndex.append()not checking names for equality (GH 48288)Bug in
MultiIndex.symmetric_difference()losing extension array (GH 48607)Bug in
MultiIndex.join()losing dtypes whenMultiIndexhas duplicates (GH 49830)Bug in
MultiIndex.putmask()losing extension array (GH 49830)Bug in
MultiIndex.value_counts()returning aSeriesindexed by flat index of tuples instead of aMultiIndex(GH 49558)
I/O#
Bug in
read_sas()caused fragmentation ofDataFrameand raisederrors.PerformanceWarning(GH 48595)Improved error message in
read_excel()by including the offending sheet name when an exception is raised while reading a file (GH 48706)Bug when a pickling a subset PyArrow-backed data that would serialize the entire data instead of the subset (GH 42600)
Bug in
read_sql_query()ignoringdtypeargument whenchunksizeis specified and result is empty (GH 50245)Bug in
read_csv()for a single-line csv with fewer columns thannamesraisederrors.ParserErrorwithengine="c"(GH 47566)Bug in
read_json()raising withorient="table"andNAvalue (GH 40255)Bug in displaying
stringdtypes not showing storage option (GH 50099)Bug in
DataFrame.to_string()withheader=Falsethat printed the index name on the same line as the first row of the data (GH 49230)Bug in
DataFrame.to_string()ignoring float formatter for extension arrays (GH 39336)Fixed memory leak which stemmed from the initialization of the internal JSON module (GH 49222)
Fixed issue where
json_normalize()would incorrectly remove leading characters from column names that matched thesepargument (GH 49861)Bug in
read_csv()unnecessarily overflowing for extension array dtype when containingNA(GH 32134)Bug in
DataFrame.to_dict()not convertingNAtoNone(GH 50795)Bug in
DataFrame.to_json()where it would segfault when failing to encode a string (GH 50307)Bug in
DataFrame.to_html()withna_repset when theDataFramecontains non-scalar data (GH 47103)Bug in
read_xml()where file-like objects failed when iterparse is used (GH 50641)Bug in
read_csv()whenengine="pyarrow"whereencodingparameter was not handled correctly (GH 51302)Bug in
read_xml()ignored repeated elements when iterparse is used (GH 51183)Bug in
ExcelWriterleaving file handles open if an exception occurred during instantiation (GH 51443)Bug in
DataFrame.to_parquet()where non-string index or columns were raising aValueErrorwhenengine="pyarrow"(GH 52036)
Period#
Bug in
Period.strftime()andPeriodIndex.strftime(), raisingUnicodeDecodeErrorwhen a locale-specific directive was passed (GH 46319)Bug in adding a
Periodobject to an array ofDateOffsetobjects incorrectly raisingTypeError(GH 50162)Bug in
Periodwhere passing a string with finer resolution than nanosecond would result in aKeyErrorinstead of dropping the extra precision (GH 50417)Bug in parsing strings representing Week-periods e.g. “2017-01-23/2017-01-29” as minute-frequency instead of week-frequency (GH 50803)
Bug in
DataFrameGroupBy.sum(),DataFrameGroupByGroupBy.cumsum(),DataFrameGroupByGroupBy.prod(),DataFrameGroupByGroupBy.cumprod()withPeriodDtypefailing to raiseTypeError(GH 51040)Bug in parsing empty string with
Periodincorrectly raisingValueErrorinstead of returningNaT(GH 51349)
Plotting#
Bug in
DataFrame.plot.hist(), not dropping elements ofweightscorresponding toNaNvalues indata(GH 48884)ax.set_xlimwas sometimes raisingUserWarningwhich users couldn’t address due toset_xlimnot accepting parsing arguments - the converter now usesTimestamp()instead (GH 49148)
Groupby/resample/rolling#
Bug in
ExponentialMovingWindowwithonlinenot raising aNotImplementedErrorfor unsupported operations (GH 48834)Bug in
DataFrameGroupBy.sample()raisesValueErrorwhen the object is empty (GH 48459)Bug in
Series.groupby()raisesValueErrorwhen an entry of the index is equal to the name of the index (GH 48567)Bug in
DataFrameGroupBy.resample()produces inconsistent results when passing empty DataFrame (GH 47705)Bug in
DataFrameGroupByandSeriesGroupBywould not include unobserved categories in result when grouping by categorical indexes (GH 49354)Bug in
DataFrameGroupByandSeriesGroupBywould change result order depending on the input index when grouping by categoricals (GH 49223)Bug in
DataFrameGroupByandSeriesGroupBywhen grouping on categorical data would sort result values even when used withsort=False(GH 42482)Bug in
DataFrameGroupBy.apply()andSeriesGroupBy.applywithas_index=Falsewould not attempt the computation without using the grouping keys when using them failed with aTypeError(GH 49256)Bug in
DataFrameGroupBy.describe()would describe the group keys (GH 49256)Bug in
SeriesGroupBy.describe()withas_index=Falsewould have the incorrect shape (GH 49256)Bug in
DataFrameGroupByandSeriesGroupBywithdropna=Falsewould drop NA values when the grouper was categorical (GH 36327)Bug in
SeriesGroupBy.nunique()would incorrectly raise when the grouper was an empty categorical andobserved=True(GH 21334)Bug in
SeriesGroupBy.nth()would raise when grouper contained NA values after subsetting from aDataFrameGroupBy(GH 26454)Bug in
DataFrame.groupby()would not include aGrouperspecified bykeyin the result whenas_index=False(GH 50413)Bug in
DataFrameGroupBy.value_counts()would raise when used with aTimeGrouper(GH 50486)Bug in
Resampler.size()caused a wideDataFrameto be returned instead of aSerieswithMultiIndex(GH 46826)Bug in
DataFrameGroupBy.transform()andSeriesGroupBy.transform()would raise incorrectly when grouper hadaxis=1for"idxmin"and"idxmax"arguments (GH 45986)Bug in
DataFrameGroupBywould raise when used with an empty DataFrame, categorical grouper, anddropna=False(GH 50634)Bug in
SeriesGroupBy.value_counts()did not respectsort=False(GH 50482)Bug in
DataFrameGroupBy.resample()raisesKeyErrorwhen getting the result from a key list when resampling on time index (GH 50840)Bug in
DataFrameGroupBy.transform()andSeriesGroupBy.transform()would raise incorrectly when grouper hadaxis=1for"ngroup"argument (GH 45986)Bug in
DataFrameGroupBy.describe()produced incorrect results when data had duplicate columns (GH 50806)Bug in
DataFrameGroupBy.agg()withengine="numba"failing to respectas_index=False(GH 51228)Bug in
DataFrameGroupBy.agg(),SeriesGroupBy.agg(), andResampler.agg()would ignore arguments when passed a list of functions (GH 50863)Bug in
DataFrameGroupBy.ohlc()ignoringas_index=False(GH 51413)Bug in
DataFrameGroupBy.agg()after subsetting columns (e.g..groupby(...)[["a", "b"]]) would not include groupings in the result (GH 51186)
Reshaping#
Bug in
DataFrame.pivot_table()raisingTypeErrorfor nullable dtype andmargins=True(GH 48681)Bug in
DataFrame.unstack()andSeries.unstack()unstacking wrong level ofMultiIndexwhenMultiIndexhas mixed names (GH 48763)Bug in
DataFrame.melt()losing extension array dtype (GH 41570)Bug in
DataFrame.pivot()not respectingNoneas column name (GH 48293)Bug in
DataFrame.join()whenleft_onorright_onis or includes aCategoricalIndexincorrectly raisingAttributeError(GH 48464)Bug in
DataFrame.pivot_table()raisingValueErrorwith parametermargins=Truewhen result is an emptyDataFrame(GH 49240)Clarified error message in
merge()when passing invalidvalidateoption (GH 49417)Bug in
DataFrame.explode()raisingValueErroron multiple columns withNaNvalues or empty lists (GH 46084)Bug in
DataFrame.transpose()withIntervalDtypecolumn withtimedelta64[ns]endpoints (GH 44917)Bug in
DataFrame.agg()andSeries.agg()would ignore arguments when passed a list of functions (GH 50863)
Sparse#
Bug in
Series.astype()when converting aSparseDtypewithdatetime64[ns]subtype toint64dtype raising, inconsistent with the non-sparse behavior (GH 49631,:issue:50087)Bug in
Series.astype()when converting a fromdatetime64[ns]toSparse[datetime64[ns]]incorrectly raising (GH 50082)Bug in
Series.sparse.to_coo()raisingSystemErrorwhenMultiIndexcontains aExtensionArray(GH 50996)
ExtensionArray#
Bug in
Series.mean()overflowing unnecessarily with nullable integers (GH 48378)Bug in
Series.tolist()for nullable dtypes returning numpy scalars instead of python scalars (GH 49890)Bug in
Series.round()for pyarrow-backed dtypes raisingAttributeError(GH 50437)Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (GH 48510)
Bug in
array.PandasArray.to_numpy()raising withNAvalue whenna_valueis specified (GH 40638)Bug in
api.types.is_numeric_dtype()where a customExtensionDtypewould not returnTrueif_is_numericreturnedTrue(GH 50563)Bug in
api.types.is_integer_dtype(),api.types.is_unsigned_integer_dtype(),api.types.is_signed_integer_dtype(),api.types.is_float_dtype()where a customExtensionDtypewould not returnTrueifkindreturned the corresponding NumPy type (GH 50667)Bug in
Seriesconstructor unnecessarily overflowing for nullable unsigned integer dtypes (GH 38798, GH 25880)Bug in setting non-string value into
StringArrayraisingValueErrorinstead ofTypeError(GH 49632)Bug in
DataFrame.reindex()not honoring the defaultcopy=Truekeyword in case of columns with ExtensionDtype (and as a result also selecting multiple columns with getitem ([]) didn’t correctly result in a copy) (GH 51197)Bug in
ArrowExtensionArraylogical operations&and|raisingKeyError(GH 51688)
Styler#
Fix
background_gradient()for nullable dtypeSerieswithNAvalues (GH 50712)
Metadata#
Fixed metadata propagation in
DataFrame.corr()andDataFrame.cov()(GH 28283)
Other#
Contributors#
A total of 260 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
5j9 +
ABCPAN-rank +
Aarni Koskela +
Aashish KC +
Abubeker Mohammed +
Adam Mróz +
Adam Ormondroyd +
Aditya Anulekh +
Ahmed Ibrahim
Akshay Babbar +
Aleksa Radojicic +
Alex +
Alex Buzenet +
Alex Kirko
Allison Kwan +
Amay Patel +
Ambuj Pawar +
Amotz +
Andreas Schwab +
Andrew Chen +
Anton Shevtsov
Antonio Ossa Guerra +
Antonio Ossa-Guerra +
Anushka Bishnoi +
Arda Kosar
Armin Berres
Asadullah Naeem +
Asish Mahapatra
Bailey Lissington +
BarkotBeyene
Ben Beasley
Bhavesh Rajendra Patil +
Bibek Jha +
Bill +
Bishwas +
CarlosGDCJ +
Carlotta Fabian +
Chris Roth +
Chuck Cadman +
Corralien +
DG +
Dan Hendry +
Daniel Isaac
David Kleindienst +
David Poznik +
David Rudel +
DavidKleindienst +
Dea María Léon +
Deepak Sirohiwal +
Dennis Chukwunta
Douglas Lohmann +
Dries Schaumont
Dustin K +
Edoardo Abati +
Eduardo Chaves +
Ege Özgüroğlu +
Ekaterina Borovikova +
Eli Schwartz +
Elvis Lim +
Emily Taylor +
Emma Carballal Haire +
Erik Welch +
Fangchen Li
Florian Hofstetter +
Flynn Owen +
Fredrik Erlandsson +
Gaurav Sheni
Georeth Chow +
George Munyoro +
Guilherme Beltramini
Gulnur Baimukhambetova +
H L +
Hans
Hatim Zahid +
HighYoda +
Hiki +
Himanshu Wagh +
Hugo van Kemenade +
Idil Ismiguzel +
Irv Lustig
Isaac Chung
Isaac Virshup
JHM Darbyshire
JHM Darbyshire (iMac)
JMBurley
Jaime Di Cristina
Jan Koch
JanVHII +
Janosh Riebesell
JasmandeepKaur +
Jeremy Tuloup
Jessica M +
Jonas Haag
Joris Van den Bossche
João Meirelles +
Julia Aoun +
Justus Magin +
Kang Su Min +
Kevin Sheppard
Khor Chean Wei
Kian Eliasi
Kostya Farber +
KotlinIsland +
Lakmal Pinnaduwage +
Lakshya A Agrawal +
Lawrence Mitchell +
Levi Ob +
Loic Diridollou
Lorenzo Vainigli +
Luca Pizzini +
Lucas Damo +
Luke Manley
Madhuri Patil +
Marc Garcia
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Maren Westermann +
Maria Stazherova +
Marie K +
Marielle +
Mark Harfouche +
Marko Pacak +
Martin +
Matheus Cerqueira +
Matheus Pedroni +
Matteo Raso +
Matthew Roeschke
MeeseeksMachine +
Mehdi Mohammadi +
Michael Harris +
Michael Mior +
Natalia Mokeeva +
Neal Muppidi +
Nick Crews
Nishu Choudhary +
Noa Tamir
Noritada Kobayashi
Omkar Yadav +
P. Talley +
Pablo +
Pandas Development Team
Parfait Gasana
Patrick Hoefler
Pedro Nacht +
Philip +
Pietro Battiston
Pooja Subramaniam +
Pranav Saibhushan Ravuri +
Pranav. P. A +
Ralf Gommers +
RaphSku +
Richard Shadrach
Robsdedude +
Roger
Roger Thomas
RogerThomas +
SFuller4 +
Salahuddin +
Sam Rao
Sean Patrick Malloy +
Sebastian Roll +
Shantanu
Shashwat +
Shashwat Agrawal +
Shiko Wamwea +
Shoham Debnath
Shubhankar Lohani +
Siddhartha Gandhi +
Simon Hawkins
Soumik Dutta +
Sowrov Talukder +
Stefanie Molin
Stefanie Senger +
Stepfen Shawn +
Steven Rotondo
Stijn Van Hoey
Sudhansu +
Sven
Sylvain MARIE
Sylvain Marié
Tabea Kossen +
Taylor Packard
Terji Petersen
Thierry Moisan
Thomas H +
Thomas Li
Torsten Wörtwein
Tsvika S +
Tsvika Shapira +
Vamsi Verma +
Vinicius Akira +
William Andrea
William Ayd
William Blum +
Wilson Xing +
Xiao Yuan +
Xnot +
Yasin Tatar +
Yuanhao Geng
Yvan Cywan +
Zachary Moon +
Zhengbo Wang +
abonte +
adrienpacifico +
alm
amotzop +
andyjessen +
anonmouse1 +
bang128 +
bishwas jha +
calhockemeyer +
carla-alves-24 +
carlotta +
casadipietra +
catmar22 +
cfabian +
codamuse +
dataxerik
davidleon123 +
dependabot[bot] +
fdrocha +
github-actions[bot]
himanshu_wagh +
iofall +
jakirkham +
jbrockmendel
jnclt +
joelchen +
joelsonoda +
joshuabello2550
joycewamwea +
kathleenhang +
krasch +
ltoniazzi +
luke396 +
milosz-martynow +
minat-hub +
mliu08 +
monosans +
nealxm
nikitaved +
paradox-lab +
partev
raisadz +
ram vikram singh +
rebecca-palmer
sarvaSanjay +
seljaks +
silviaovo +
smij720 +
soumilbaldota +
stellalin7 +
strawberry beach sandals +
tmoschou +
uzzell +
yqyqyq-W +
yun +
Ádám Lippai
김동현 (Daniel Donghyun Kim) +