What’s new in 1.4.0 (January 22, 2022)#
These are the changes in pandas 1.4.0. See Release notes for a full changelog including other versions of pandas.
Enhancements#
Improved warning messages#
Previously, warning messages may have pointed to lines within the pandas
library. Running the script setting_with_copy_warning.py
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
df[:2].loc[:, 'a'] = 5
with pandas 1.3 resulted in:
.../site-packages/pandas/core/indexing.py:1951: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
This made it difficult to determine where the warning was being generated from. Now pandas will inspect the call stack, reporting the first line outside of the pandas library that gave rise to the warning. The output of the above script is now:
setting_with_copy_warning.py:4: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Index can hold arbitrary ExtensionArrays#
Until now, passing a custom ExtensionArray to pd.Index would cast
the array to object dtype. Now Index can directly hold arbitrary
ExtensionArrays (GH 43930).
Previous behavior:
In [1]: arr = pd.array([1, 2, pd.NA])
In [2]: idx = pd.Index(arr)
In the old behavior, idx would be object-dtype:
Previous behavior:
In [1]: idx
Out[1]: Index([1, 2, <NA>], dtype='object')
With the new behavior, we keep the original dtype:
New behavior:
In [3]: idx
Out[3]: Index([1, 2, <NA>], dtype='Int64')
One exception to this is SparseArray, which will continue to cast to numpy
dtype until pandas 2.0. At that point it will retain its dtype like other
ExtensionArrays.
Styler#
Styler has been further developed in 1.4.0. The following general enhancements have been made:
Styling and formatting of indexes has been added, with
Styler.apply_index(),Styler.applymap_index()andStyler.format_index(). These mirror the signature of the methods already used to style and format data values, and work with both HTML, LaTeX and Excel format (GH 41893, GH 43101, GH 41993, GH 41995)The new method
Styler.hide()deprecatesStyler.hide_index()andStyler.hide_columns()(GH 43758)The keyword arguments
levelandnameshave been added toStyler.hide()(and implicitly to the deprecated methodsStyler.hide_index()andStyler.hide_columns()) for additional control of visibility of MultiIndexes and of Index names (GH 25475, GH 43404, GH 43346)The
Styler.export()andStyler.use()have been updated to address all of the added functionality from v1.2.0 and v1.3.0 (GH 40675)Global options under the category
pd.options.stylerhave been extended to configure defaultStylerproperties which address formatting, encoding, and HTML and LaTeX rendering. Note that formerlyStylerrelied ondisplay.html.use_mathjax, which has now been replaced bystyler.html.mathjax(GH 41395)Validation of certain keyword arguments, e.g.
caption(GH 43368)Various bug fixes as recorded below
Additionally there are specific enhancements to the HTML specific rendering:
Styler.bar()introduces additional arguments to control alignment and display (GH 26070, GH 36419), and it also validates the input argumentswidthandheight(GH 42511)
Styler.to_html()introduces keyword argumentssparse_index,sparse_columns,bold_headers,caption,max_rowsandmax_columns(GH 41946, GH 43149, GH 42972)
Styler.to_html()omits CSSStyle rules for hidden table elements as a performance enhancement (GH 43619)Custom CSS classes can now be directly specified without string replacement (GH 43686)
Ability to render hyperlinks automatically via a new
hyperlinksformatting keyword argument (GH 45058)
There are also some LaTeX specific enhancements:
Styler.to_latex()introduces keyword argumentenvironment, which also allows a specific “longtable” entry through a separate jinja2 template (GH 41866)Naive sparsification is now possible for LaTeX without the necessity of including the multirow package (GH 43369)
cline support has been added for
MultiIndexrow sparsification through a keyword argument (GH 45138)
Multi-threaded CSV reading with a new CSV Engine based on pyarrow#
pandas.read_csv() now accepts engine="pyarrow" (requires at least
pyarrow 1.0.1) as an argument, allowing for faster csv parsing on multicore
machines with pyarrow installed. See the I/O docs for
more info. (GH 23697, GH 43706)
Rank function for rolling and expanding windows#
Added rank function to Rolling and Expanding. The new
function supports the method, ascending, and pct flags of
DataFrame.rank(). The method argument supports min, max, and
average ranking methods.
Example:
In [4]: s = pd.Series([1, 4, 2, 3, 5, 3])
In [5]: s.rolling(3).rank()
Out[5]:
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 1.5
dtype: float64
In [6]: s.rolling(3).rank(method="max")
Out[6]:
0 NaN
1 NaN
2 2.0
3 2.0
4 3.0
5 2.0
dtype: float64
Groupby positional indexing#
It is now possible to specify positional ranges relative to the ends of each group.
Negative arguments for DataFrameGroupBy.head(), SeriesGroupBy.head(), DataFrameGroupBy.tail(), and SeriesGroupBy.tail() now work
correctly and result in ranges relative to the end and start of each group,
respectively. Previously, negative arguments returned empty frames.
In [7]: df = pd.DataFrame([["g", "g0"], ["g", "g1"], ["g", "g2"], ["g", "g3"],
...: ["h", "h0"], ["h", "h1"]], columns=["A", "B"])
...:
In [8]: df.groupby("A").head(-1)
Out[8]:
A B
0 g g0
1 g g1
2 g g2
4 h h0
DataFrameGroupBy.nth() and SeriesGroupBy.nth() now accept a slice or list of integers and slices.
In [9]: df.groupby("A").nth(slice(1, -1))
Out[9]:
A B
1 g g1
2 g g2
In [10]: df.groupby("A").nth([slice(None, 1), slice(-1, None)])
Out[10]:
A B
0 g g0
3 g g3
4 h h0
5 h h1
DataFrameGroupBy.nth() and SeriesGroupBy.nth() now accept index notation.
In [11]: df.groupby("A").nth[1, -1]
Out[11]:
A B
1 g g1
3 g g3
5 h h1
In [12]: df.groupby("A").nth[1:-1]
Out[12]:
A B
1 g g1
2 g g2
In [13]: df.groupby("A").nth[:1, -1:]
Out[13]:
A B
0 g g0
3 g g3
4 h h0
5 h h1
DataFrame.from_dict and DataFrame.to_dict have new 'tight' option#
A new 'tight' dictionary format that preserves MultiIndex entries
and names is now available with the DataFrame.from_dict() and
DataFrame.to_dict() methods and can be used with the standard json
library to produce a tight representation of DataFrame objects
(GH 4889).
In [14]: df = pd.DataFrame.from_records(
....: [[1, 3], [2, 4]],
....: index=pd.MultiIndex.from_tuples([("a", "b"), ("a", "c")],
....: names=["n1", "n2"]),
....: columns=pd.MultiIndex.from_tuples([("x", 1), ("y", 2)],
....: names=["z1", "z2"]),
....: )
....:
In [15]: df
Out[15]:
z1 x y
z2 1 2
n1 n2
a b 1 3
c 2 4
In [16]: df.to_dict(orient='tight')
Out[16]:
{'index': [('a', 'b'), ('a', 'c')],
'columns': [('x', 1), ('y', 2)],
'data': [[1, 3], [2, 4]],
'index_names': ['n1', 'n2'],
'column_names': ['z1', 'z2']}
Other enhancements#
concat()will preserve theattrswhen it is the same for all objects and discard theattrswhen they are different (GH 41828)DataFrameGroupByoperations withas_index=Falsenow correctly retainExtensionDtypedtypes for columns being grouped on (GH 41373)Add support for assigning values to
byargument inDataFrame.plot.hist()andDataFrame.plot.box()(GH 15079)Series.sample(),DataFrame.sample(),DataFrameGroupBy.sample(), andSeriesGroupBy.sample()now accept anp.random.Generatoras input torandom_state. A generator will be more performant, especially withreplace=False(GH 38100)Series.ewm()andDataFrame.ewm()now support amethodargument with a'table'option that performs the windowing operation over an entireDataFrame. See Window Overview for performance and functional benefits (GH 42273)DataFrameGroupBy.cummin(),SeriesGroupBy.cummin(),DataFrameGroupBy.cummax(), andSeriesGroupBy.cummax()now support the argumentskipna(GH 34047)read_table()now supports the argumentstorage_options(GH 39167)DataFrame.to_stata()andStataWriter()now accept the keyword only argumentvalue_labelsto save labels for non-categorical columns (GH 38454)Methods that relied on hashmap based algos such as
DataFrameGroupBy.value_counts(),DataFrameGroupBy.count()andfactorize()ignored imaginary component for complex numbers (GH 17927)Add
Series.str.removeprefix()andSeries.str.removesuffix()introduced in Python 3.9 to remove pre-/suffixes from string-typeSeries(GH 36944)Attempting to write into a file in missing parent directory with
DataFrame.to_csv(),DataFrame.to_html(),DataFrame.to_excel(),DataFrame.to_feather(),DataFrame.to_parquet(),DataFrame.to_stata(),DataFrame.to_json(),DataFrame.to_pickle(), andDataFrame.to_xml()now explicitly mentions missing parent directory, the same is true forSeriescounterparts (GH 24306)Indexing with
.locand.ilocnow supportsEllipsis(GH 37750)IntegerArray.all(),IntegerArray.any(),FloatingArray.any(), andFloatingArray.all()use Kleene logic (GH 41967)Added support for nullable boolean and integer types in
DataFrame.to_stata(),StataWriter,StataWriter117, andStataWriterUTF8(GH 40855)DataFrame.__pos__()andDataFrame.__neg__()now retainExtensionDtypedtypes (GH 43883)The error raised when an optional dependency can’t be imported now includes the original exception, for easier investigation (GH 43882)
Series.str.split()now supports aregexargument that explicitly specifies whether the pattern is a regular expression. Default isNone(GH 43563, GH 32835, GH 25549)DataFrame.dropna()now accepts a single label assubsetalong with array-like (GH 41021)Added
DataFrameGroupBy.value_counts()(GH 43564)read_csv()now accepts acallablefunction inon_bad_lineswhenengine="python"for custom handling of bad lines (GH 5686)ExcelWriterargumentif_sheet_exists="overlay"option added (GH 40231)read_excel()now accepts adecimalargument that allow the user to specify the decimal point when parsing string columns to numeric (GH 14403)DataFrameGroupBy.mean(),SeriesGroupBy.mean(),DataFrameGroupBy.std(),SeriesGroupBy.std(),DataFrameGroupBy.var(),SeriesGroupBy.var(),DataFrameGroupBy.sum(), andSeriesGroupBy.sum()now support Numba execution with theenginekeyword (GH 43731, GH 44862, GH 44939)Timestamp.isoformat()now handles thetimespecargument from the basedatetimeclass (GH 26131)NaT.to_numpy()dtypeargument is now respected, sonp.timedelta64can be returned (GH 44460)New option
display.max_dir_itemscustomizes the number of columns added toDataframe.__dir__()and suggested for tab completion (GH 37996)Added “Juneteenth National Independence Day” to
USFederalHolidayCalendar(GH 44574)Rolling.var(),Expanding.var(),Rolling.std(), andExpanding.std()now support Numba execution with theenginekeyword (GH 44461)Series.info()has been added, for compatibility withDataFrame.info()(GH 5167)Implemented
IntervalArray.min()andIntervalArray.max(), as a result of whichminandmaxnow work forIntervalIndex,SeriesandDataFramewithIntervalDtype(GH 44746)UInt64Index.map()now retainsdtypewhere possible (GH 44609)read_json()can now parse unsigned long long integers (GH 26068)DataFrame.take()now raises aTypeErrorwhen passed a scalar for the indexer (GH 42875)is_list_like()now identifies duck-arrays as list-like unless.ndim == 0(GH 35131)ExtensionDtypeandExtensionArrayare now (de)serialized when exporting aDataFramewithDataFrame.to_json()usingorient='table'(GH 20612, GH 44705)Add support for Zstandard compression to
DataFrame.to_pickle()/read_pickle()and friends (GH 43925)DataFrame.to_sql()now returns anintof the number of written rows (GH 23998)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Inconsistent date string parsing#
The dayfirst option of to_datetime() isn’t strict, and this can lead
to surprising behavior:
In [17]: pd.to_datetime(["31-12-2021"], dayfirst=False)
Out[17]: DatetimeIndex(['2021-12-31'], dtype='datetime64[ns]', freq=None)
Now, a warning will be raised if a date string cannot be parsed accordance to
the given dayfirst value when the value is a delimited date string (e.g.
31-12-2012).
Ignoring dtypes in concat with empty or all-NA columns#
Note
This behaviour change has been reverted in pandas 1.4.3.
When using concat() to concatenate two or more DataFrame objects,
if one of the DataFrames was empty or had all-NA values, its dtype was
sometimes ignored when finding the concatenated dtype. These are now
consistently not ignored (GH 43507).
In [3]: df1 = pd.DataFrame({"bar": [pd.Timestamp("2013-01-01")]}, index=range(1))
In [4]: df2 = pd.DataFrame({"bar": np.nan}, index=range(1, 2))
In [5]: res = pd.concat([df1, df2])
Previously, the float-dtype in df2 would be ignored so the result dtype
would be datetime64[ns]. As a result, the np.nan would be cast to
NaT.
Previous behavior:
In [6]: res
Out[6]:
bar
0 2013-01-01
1 NaT
Now the float-dtype is respected. Since the common dtype for these DataFrames is
object, the np.nan is retained.
New behavior:
In [6]: res
Out[6]:
bar
0 2013-01-01 00:00:00
1 NaN
Null-values are no longer coerced to NaN-value in value_counts and mode#
Series.value_counts() and Series.mode() no longer coerce None,
NaT and other null-values to a NaN-value for np.object_-dtype. This
behavior is now consistent with unique, isin and others
(GH 42688).
In [18]: s = pd.Series([True, None, pd.NaT, None, pd.NaT, None])
In [19]: res = s.value_counts(dropna=False)
Previously, all null-values were replaced by a NaN-value.
Previous behavior:
In [3]: res
Out[3]:
NaN 5
True 1
dtype: int64
Now null-values are no longer mangled.
New behavior:
In [20]: res
Out[20]:
None 3
NaT 2
True 1
Name: count, dtype: int64
mangle_dupe_cols in read_csv no longer renames unique columns conflicting with target names#
read_csv() no longer renames unique column labels which conflict with the target
names of duplicated columns. Already existing columns are skipped, i.e. the next
available index is used for the target column name (GH 14704).
In [21]: import io
In [22]: data = "a,a,a.1\n1,2,3"
In [23]: res = pd.read_csv(io.StringIO(data))
Previously, the second column was called a.1, while the third column was
also renamed to a.1.1.
Previous behavior:
In [3]: res
Out[3]:
a a.1 a.1.1
0 1 2 3
Now the renaming checks if a.1 already exists when changing the name of the
second column and jumps this index. The second column is instead renamed to
a.2.
New behavior:
In [24]: res
Out[24]:
a a.2 a.1
0 1 2 3
unstack and pivot_table no longer raises ValueError for result that would exceed int32 limit#
Previously DataFrame.pivot_table() and DataFrame.unstack() would
raise a ValueError if the operation could produce a result with more than
2**31 - 1 elements. This operation now raises a
errors.PerformanceWarning instead (GH 26314).
Previous behavior:
In [3]: df = DataFrame({"ind1": np.arange(2 ** 16), "ind2": np.arange(2 ** 16), "count": 0})
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
ValueError: Unstacked DataFrame is too big, causing int32 overflow
New behavior:
In [4]: df.pivot_table(index="ind1", columns="ind2", values="count", aggfunc="count")
PerformanceWarning: The following operation may generate 4294967296 cells in the resulting pandas object.
groupby.apply consistent transform detection#
DataFrameGroupBy.apply() and SeriesGroupBy.apply() are designed to be flexible, allowing users to perform
aggregations, transformations, filters, and use it with user-defined functions
that might not fall into any of these categories. As part of this, apply will
attempt to detect when an operation is a transform, and in such a case, the
result will have the same index as the input. In order to determine if the
operation is a transform, pandas compares the input’s index to the result’s and
determines if it has been mutated. Previously in pandas 1.3, different code
paths used different definitions of “mutated”: some would use Python’s is
whereas others would test only up to equality.
This inconsistency has been removed, pandas now tests up to equality.
In [25]: def func(x):
....: return x.copy()
....:
In [26]: df = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [5, 6]})
In [27]: df
Out[27]:
a b c
0 1 3 5
1 2 4 6
Previous behavior:
In [3]: df.groupby(['a']).apply(func)
Out[3]:
a b c
a
1 0 1 3 5
2 1 2 4 6
In [4]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[4]:
c
a b
1 3 5
2 4 6
In the examples above, the first uses a code path where pandas uses is and
determines that func is not a transform whereas the second tests up to
equality and determines that func is a transform. In the first case, the
result’s index is not the same as the input’s.
New behavior:
In [5]: df.groupby(['a']).apply(func)
Out[5]:
a b c
0 1 3 5
1 2 4 6
In [6]: df.set_index(['a', 'b']).groupby(['a']).apply(func)
Out[6]:
c
a b
1 3 5
2 4 6
Now in both cases it is determined that func is a transform. In each case,
the result has the same index as the input.
Backwards incompatible API changes#
Increased minimum version for Python#
pandas 1.4.0 supports Python 3.8 and higher.
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
numpy |
1.18.5 |
X |
X |
pytz |
2020.1 |
X |
X |
python-dateutil |
2.8.1 |
X |
X |
bottleneck |
1.3.1 |
X |
|
numexpr |
2.7.1 |
X |
|
pytest (dev) |
6.0 |
||
mypy (dev) |
0.930 |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
beautifulsoup4 |
4.8.2 |
X |
fastparquet |
0.4.0 |
|
fsspec |
0.7.4 |
|
gcsfs |
0.6.0 |
|
lxml |
4.5.0 |
X |
matplotlib |
3.3.2 |
X |
numba |
0.50.1 |
X |
openpyxl |
3.0.3 |
X |
pandas-gbq |
0.14.0 |
X |
pyarrow |
1.0.1 |
X |
pymysql |
0.10.1 |
X |
pytables |
3.6.1 |
X |
s3fs |
0.4.0 |
|
scipy |
1.4.1 |
X |
sqlalchemy |
1.4.0 |
X |
tabulate |
0.8.7 |
|
xarray |
0.15.1 |
X |
xlrd |
2.0.1 |
X |
xlsxwriter |
1.2.2 |
X |
xlwt |
1.3.0 |
See Dependencies and Optional dependencies for more.
Other API changes#
Index.get_indexer_for()no longer accepts keyword arguments (other thantarget); in the past these would be silently ignored if the index was not unique (GH 42310)Change in the position of the
min_rowsargument inDataFrame.to_string()due to change in the docstring (GH 44304)Reduction operations for
DataFrameorSeriesnow raising aValueErrorwhenNoneis passed forskipna(GH 44178)read_csv()andread_html()no longer raising an error when one of the header rows consists only ofUnnamed:columns (GH 13054)Changed the
nameattribute of several holidays inUSFederalHolidayCalendarto match official federal holiday names specifically:“New Year’s Day” gains the possessive apostrophe
“Presidents Day” becomes “Washington’s Birthday”
“Martin Luther King Jr. Day” is now “Birthday of Martin Luther King, Jr.”
“July 4th” is now “Independence Day”
“Thanksgiving” is now “Thanksgiving Day”
“Christmas” is now “Christmas Day”
Added “Juneteenth National Independence Day”
Deprecations#
Deprecated Int64Index, UInt64Index & Float64Index#
Int64Index, UInt64Index and Float64Index have been
deprecated in favor of the base Index class and will be removed in
Pandas 2.0 (GH 43028).
For constructing a numeric index, you can use the base Index class
instead specifying the data type (which will also work on older pandas
releases):
# replace
pd.Int64Index([1, 2, 3])
# with
pd.Index([1, 2, 3], dtype="int64")
For checking the data type of an index object, you can replace isinstance
checks with checking the dtype:
# replace
isinstance(idx, pd.Int64Index)
# with
idx.dtype == "int64"
Currently, in order to maintain backward compatibility, calls to Index
will continue to return Int64Index, UInt64Index and
Float64Index when given numeric data, but in the future, an
Index will be returned.
Current behavior:
In [1]: pd.Index([1, 2, 3], dtype="int32")
Out [1]: Int64Index([1, 2, 3], dtype='int64')
In [1]: pd.Index([1, 2, 3], dtype="uint64")
Out [1]: UInt64Index([1, 2, 3], dtype='uint64')
Future behavior:
In [3]: pd.Index([1, 2, 3], dtype="int32")
Out [3]: Index([1, 2, 3], dtype='int32')
In [4]: pd.Index([1, 2, 3], dtype="uint64")
Out [4]: Index([1, 2, 3], dtype='uint64')
Deprecated DataFrame.append and Series.append#
DataFrame.append() and Series.append() have been deprecated and will
be removed in a future version. Use pandas.concat() instead (GH 35407).
Deprecated syntax
In [1]: pd.Series([1, 2]).append(pd.Series([3, 4])
Out [1]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
0 1
1 2
0 3
1 4
dtype: int64
In [2]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [3]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [4]: df1.append(df2)
Out [4]:
<stdin>:1: FutureWarning: The series.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
A B
0 1 2
1 3 4
0 5 6
1 7 8
Recommended syntax
In [28]: pd.concat([pd.Series([1, 2]), pd.Series([3, 4])])
Out[28]:
0 1
1 2
0 3
1 4
dtype: int64
In [29]: df1 = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
In [30]: df2 = pd.DataFrame([[5, 6], [7, 8]], columns=list('AB'))
In [31]: pd.concat([df1, df2])
Out[31]:
A B
0 1 2
1 3 4
0 5 6
1 7 8
Other Deprecations#
Deprecated
Index.is_type_compatible()(GH 42113)Deprecated
methodargument inIndex.get_loc(), useindex.get_indexer([label], method=...)instead (GH 42269)Deprecated treating integer keys in
Series.__setitem__()as positional when the index is aFloat64Indexnot containing the key, aIntervalIndexwith no entries containing the key, or aMultiIndexwith leadingFloat64Indexlevel not containing the key (GH 33469)Deprecated treating
numpy.datetime64objects as UTC times when passed to theTimestampconstructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, useTimestamp(dt64).tz_localize("UTC").tz_convert(tz)(GH 24559)Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a
MultiIndex(GH 42351)Creating an empty
Serieswithout adtypewill now raise a more visibleFutureWarninginstead of aDeprecationWarning(GH 30017)Deprecated the
kindargument inIndex.get_slice_bound(),Index.slice_indexer(), andIndex.slice_locs(); in a future version passingkindwill raise (GH 42857)Deprecated dropping of nuisance columns in
Rolling,Expanding, andEWMaggregations (GH 42738)Deprecated
Index.reindex()with a non-uniqueIndex(GH 42568)Deprecated
Styler.render()in favor ofStyler.to_html()(GH 42140)Deprecated
Styler.hide_index()andStyler.hide_columns()in favor ofStyler.hide()(GH 43758)Deprecated passing in a string column label into
timesinDataFrame.ewm()(GH 43265)Deprecated the
include_startandinclude_endarguments inDataFrame.between_time(); in a future version passinginclude_startorinclude_endwill raise (GH 40245)Deprecated the
squeezeargument toread_csv(),read_table(), andread_excel(). Users should squeeze theDataFrameafterwards with.squeeze("columns")instead (GH 43242)Deprecated the
indexargument toSparseArrayconstruction (GH 23089)Deprecated the
closedargument indate_range()andbdate_range()in favor ofinclusiveargument; In a future version passingclosedwill raise (GH 40245)Deprecated
Rolling.validate(),Expanding.validate(), andExponentialMovingWindow.validate()(GH 43665)Deprecated silent dropping of columns that raised a
TypeErrorinSeries.transformandDataFrame.transformwhen used with a dictionary (GH 43740)Deprecated silent dropping of columns that raised a
TypeError,DataError, and some cases ofValueErrorinSeries.aggregate(),DataFrame.aggregate(),Series.groupby.aggregate(), andDataFrame.groupby.aggregate()when used with a list (GH 43740)Deprecated casting behavior when setting timezone-aware value(s) into a timezone-aware
SeriesorDataFramecolumn when the timezones do not match. Previously this cast to object dtype. In a future version, the values being inserted will be converted to the series or column’s existing timezone (GH 37605)Deprecated casting behavior when passing an item with mismatched-timezone to
DatetimeIndex.insert(),DatetimeIndex.putmask(),DatetimeIndex.where()DatetimeIndex.fillna(),Series.mask(),Series.where(),Series.fillna(),Series.shift(),Series.replace(),Series.reindex()(andDataFramecolumn analogues). In the past this has cast to objectdtype. In a future version, these will cast the passed item to the index or series’s timezone (GH 37605, GH 44940)Deprecated the
prefixkeyword argument inread_csv()andread_table(), in a future version the argument will be removed (GH 43396)Deprecated passing non boolean argument to
sortinconcat()(GH 41518)Deprecated passing arguments as positional for
read_fwf()other thanfilepath_or_buffer(GH 41485)Deprecated passing arguments as positional for
read_xml()other thanpath_or_buffer(GH 45133)Deprecated passing
skipna=NoneforDataFrame.mad()andSeries.mad(), passskipna=Trueinstead (GH 44580)Deprecated the behavior of
to_datetime()with the string “now” withutc=False; in a future version this will matchTimestamp("now"), which in turn matchesTimestamp.now()returning the local time (GH 18705)Deprecated
DateOffset.apply(), useoffset + otherinstead (GH 44522)Deprecated parameter
namesinIndex.copy()(GH 44916)A deprecation warning is now shown for
DataFrame.to_latex()indicating the arguments signature may change and emulate more the arguments toStyler.to_latex()in future versions (GH 44411)Deprecated behavior of
concat()between objects with bool-dtype and numeric-dtypes; in a future version these will cast to object dtype instead of coercing bools to numeric values (GH 39817)Deprecated
Categorical.replace(), useSeries.replace()instead (GH 44929)Deprecated passing
setordictas indexer forDataFrame.loc.__setitem__(),DataFrame.loc.__getitem__(),Series.loc.__setitem__(),Series.loc.__getitem__(),DataFrame.__getitem__(),Series.__getitem__()andSeries.__setitem__()(GH 42825)Deprecated
Index.__getitem__()with a bool key; useindex.values[key]to get the old behavior (GH 44051)Deprecated downcasting column-by-column in
DataFrame.where()with integer-dtypes (GH 44597)Deprecated
DatetimeIndex.union_many(), useDatetimeIndex.union()instead (GH 44091)Deprecated
Groupby.pad()in favor ofGroupby.ffill()(GH 33396)Deprecated
Groupby.backfill()in favor ofGroupby.bfill()(GH 33396)Deprecated
Resample.pad()in favor ofResample.ffill()(GH 33396)Deprecated
Resample.backfill()in favor ofResample.bfill()(GH 33396)Deprecated
numeric_only=NoneinDataFrame.rank(); in a future versionnumeric_onlymust be eitherTrueorFalse(the default) (GH 45036)Deprecated the behavior of
Timestamp.utcfromtimestamp(), in the future it will return a timezone-aware UTCTimestamp(GH 22451)Deprecated
NaT.freq()(GH 45071)Deprecated behavior of
SeriesandDataFrameconstruction when passed float-dtype data containingNaNand an integer dtype ignoring the dtype argument; in a future version this will raise (GH 40110)Deprecated the behaviour of
Series.to_frame()andIndex.to_frame()to ignore thenameargument whenname=None. Currently, this means to preserve the existing name, but in the future explicitly passingname=Nonewill setNoneas the name of the column in the resulting DataFrame (GH 44212)
Performance improvements#
Performance improvement in
DataFrameGroupBy.sample()andSeriesGroupBy.sample(), especially whenweightsargument provided (GH 34483)Performance improvement when converting non-string arrays to string arrays (GH 34483)
Performance improvement in
DataFrameGroupBy.transform()andSeriesGroupBy.transform()for user-defined functions (GH 41598)Performance improvement in constructing
DataFrameobjects (GH 42631, GH 43142, GH 43147, GH 43307, GH 43144, GH 44826)Performance improvement in
DataFrameGroupBy.shift()andSeriesGroupBy.shift()whenfill_valueargument is provided (GH 26615)Performance improvement in
DataFrame.corr()formethod=pearsonon data without missing values (GH 40956)Performance improvement in some
DataFrameGroupBy.apply()andSeriesGroupBy.apply()operations (GH 42992, GH 43578)Performance improvement in
read_stata()(GH 43059, GH 43227)Performance improvement in
read_sas()(GH 43333)Performance improvement in
to_datetime()withuintdtypes (GH 42606)Performance improvement in
to_datetime()withinfer_datetime_formatset toTrue(GH 43901)Performance improvement in
Series.sparse.to_coo()(GH 42880)Performance improvement in indexing with a
UInt64Index(GH 43862)Performance improvement in indexing with a
Float64Index(GH 43705)Performance improvement in indexing with a non-unique
Index(GH 43792)Performance improvement in indexing with a listlike indexer on a
MultiIndex(GH 43370)Performance improvement in indexing with a
MultiIndexindexer on anotherMultiIndex(GH 43370)Performance improvement in
DataFrameGroupBy.quantile()andSeriesGroupBy.quantile()(GH 43469, GH 43725)Performance improvement in
DataFrameGroupBy.count()andSeriesGroupBy.count()(GH 43730, GH 43694)Performance improvement in
DataFrameGroupBy.any(),SeriesGroupBy.any(),DataFrameGroupBy.all(), andSeriesGroupBy.all()(GH 43675, GH 42841)Performance improvement in
DataFrameGroupBy.std()andSeriesGroupBy.std()(GH 43115, GH 43576)Performance improvement in
DataFrameGroupBy.cumsum()andSeriesGroupBy.cumsum()(GH 43309)SparseArray.min()andSparseArray.max()no longer require converting to a dense array (GH 43526)Indexing into a
SparseArraywith aslicewithstep=1no longer requires converting to a dense array (GH 43777)Performance improvement in
SparseArray.take()withallow_fill=False(GH 43654)Performance improvement in
Rolling.mean(),Expanding.mean(),Rolling.sum(),Expanding.sum(),Rolling.max(),Expanding.max(),Rolling.min()andExpanding.min()withengine="numba"(GH 43612, GH 44176, GH 45170)Improved performance of
pandas.read_csv()withmemory_map=Truewhen file encoding is UTF-8 (GH 43787)Performance improvement in
RangeIndex.sort_values()overridingIndex.sort_values()(GH 43666)Performance improvement in
RangeIndex.insert()(GH 43988)Performance improvement in
Index.insert()(GH 43953)Performance improvement in
DatetimeIndex.tolist()(GH 43823)Performance improvement in
DatetimeIndex.union()(GH 42353)Performance improvement in
Series.nsmallest()(GH 43696)Performance improvement in
DataFrame.insert()(GH 42998)Performance improvement in
DataFrame.dropna()(GH 43683)Performance improvement in
DataFrame.fillna()(GH 43316)Performance improvement in
DataFrame.values()(GH 43160)Performance improvement in
DataFrame.select_dtypes()(GH 42611)Performance improvement in
DataFramereductions (GH 43185, GH 43243, GH 43311, GH 43609)Performance improvement in
Series.unstack()andDataFrame.unstack()(GH 43335, GH 43352, GH 42704, GH 43025)Performance improvement in
Series.to_frame()(GH 43558)Performance improvement in
Series.mad()(GH 43010)Performance improvement in
to_csv()when index column is a datetime and is formatted (GH 39413)Performance improvement in
to_csv()whenMultiIndexcontains a lot of unused levels (GH 37484)Performance improvement in
read_csv()whenindex_colwas set with a numeric column (GH 44158)Performance improvement in
SparseArray.__getitem__()(GH 23122)Performance improvement in constructing a
DataFramefrom array-like objects like aPytorchtensor (GH 44616)
Bug fixes#
Categorical#
Bug in setting dtype-incompatible values into a
Categorical(orSeriesorDataFramebacked byCategorical) raisingValueErrorinstead ofTypeError(GH 41919)Bug in
Categorical.searchsorted()when passing a dtype-incompatible value raisingKeyErrorinstead ofTypeError(GH 41919)Bug in
Categorical.astype()casting datetimes andTimestampto int for dtypeobject(GH 44930)Bug in
Series.where()withCategoricalDtypewhen passing a dtype-incompatible value raisingValueErrorinstead ofTypeError(GH 41919)Bug in
Categorical.fillna()when passing a dtype-incompatible value raisingValueErrorinstead ofTypeError(GH 41919)Bug in
Categorical.fillna()with a tuple-like category raisingValueErrorinstead ofTypeErrorwhen filling with a non-category tuple (GH 41919)
Datetimelike#
Bug in
DataFrameconstructor unnecessarily copying non-datetimelike 2D object arrays (GH 39272)Bug in
to_datetime()withformatandpandas.NAwas raisingValueError(GH 42957)to_datetime()would silently swapMM/DD/YYYYandDD/MM/YYYYformats if the givendayfirstoption could not be respected - now, a warning is raised in the case of delimited date strings (e.g.31-12-2012) (GH 12585)Bug in
date_range()andbdate_range()do not return right bound whenstart=endand set is closed on one side (GH 43394)Bug in inplace addition and subtraction of
DatetimeIndexorTimedeltaIndexwithDatetimeArrayorTimedeltaArray(GH 43904)Bug in calling
np.isnan,np.isfinite, ornp.isinfon a timezone-awareDatetimeIndexincorrectly raisingTypeError(GH 43917)Bug in constructing a
Seriesfrom datetime-like strings with mixed timezones incorrectly partially-inferring datetime values (GH 40111)Bug in addition of a
Tickobject and anp.timedelta64object incorrectly raising instead of returningTimedelta(GH 44474)np.maximum.reduceandnp.minimum.reducenow correctly returnTimestampandTimedeltaobjects when operating onSeries,DataFrame, orIndexwithdatetime64[ns]ortimedelta64[ns]dtype (GH 43923)Bug in adding a
np.timedelta64object to aBusinessDayorCustomBusinessDayobject incorrectly raising (GH 44532)Bug in
Index.insert()for insertingnp.datetime64,np.timedelta64ortupleintoIndexwithdtype='object'with negative loc addingNoneand replacing existing value (GH 44509)Bug in
Timestamp.to_pydatetime()failing to retain thefoldattribute (GH 45087)Bug in
Series.mode()withDatetimeTZDtypeincorrectly returning timezone-naive andPeriodDtypeincorrectly raising (GH 41927)Fixed regression in
reindex()raising an error when using an incompatible fill value with a datetime-like dtype (or not raising a deprecation warning for using adatetime.dateas fill value) (GH 42921)Bug in
DateOffsetaddition withTimestampwhereoffset.nanosecondswould not be included in the result (GH 43968, GH 36589)Bug in
Timestamp.fromtimestamp()not supporting thetzargument (GH 45083)Bug in
DataFrameconstruction from dict ofSerieswith mismatched index dtypes sometimes raising depending on the ordering of the passed dict (GH 44091)Bug in
Timestamphashing during some DST transitions caused a segmentation fault (GH 33931 and GH 40817)
Timedelta#
Bug in division of all-
NaTTimeDeltaIndex,SeriesorDataFramecolumn with object-dtype array like of numbers failing to infer the result as timedelta64-dtype (GH 39750)Bug in floor division of
timedelta64[ns]data with a scalar returning garbage values (GH 44466)Bug in
Timedeltanow properly taking into account any nanoseconds contribution of any kwarg (GH 43764, GH 45227)
Time Zones#
Bug in
to_datetime()withinfer_datetime_format=Truefailing to parse zero UTC offset (Z) correctly (GH 41047)Bug in
Series.dt.tz_convert()resetting index in aSerieswithCategoricalIndex(GH 43080)Bug in
TimestampandDatetimeIndexincorrectly raising aTypeErrorwhen subtracting two timezone-aware objects with mismatched timezones (GH 31793)
Numeric#
Bug in floor-dividing a list or tuple of integers by a
Seriesincorrectly raising (GH 44674)Bug in
DataFrame.rank()raisingValueErrorwithobjectcolumns andmethod="first"(GH 41931)Bug in
DataFrame.rank()treating missing values and extreme values as equal (for examplenp.nanandnp.inf), causing incorrect results whenna_option="bottom"orna_option="topused (GH 41931)Bug in
numexprengine still being used when the optioncompute.use_numexpris set toFalse(GH 32556)Bug in
DataFramearithmetic ops with a subclass whose_constructor()attribute is a callable other than the subclass itself (GH 43201)Bug in arithmetic operations involving
RangeIndexwhere the result would have the incorrectname(GH 43962)Bug in arithmetic operations involving
Serieswhere the result could have the incorrectnamewhen the operands having matching NA or matching tuple names (GH 44459)Bug in division with
IntegerDtypeorBooleanDtypearray and NA scalar incorrectly raising (GH 44685)Bug in multiplying a
SerieswithFloatingDtypewith a timedelta-like scalar incorrectly raising (GH 44772)
Conversion#
Bug in
UInt64Indexconstructor when passing a list containing both positive integers small enough to cast to int64 and integers too large to hold in int64 (GH 42201)Bug in
Seriesconstructor returning 0 for missing values with dtypeint64andFalsefor dtypebool(GH 43017, GH 43018)Bug in constructing a
DataFramefrom aPandasArraycontainingSeriesobjects behaving differently than an equivalentnp.ndarray(GH 43986)Bug in
IntegerDtypenot allowing coercion from string dtype (GH 25472)Bug in
to_datetime()witharg:xr.DataArrayandunit="ns"specified raisesTypeError(GH 44053)Bug in
DataFrame.convert_dtypes()not returning the correct type when a subclass does not overload_constructor_sliced()(GH 43201)Bug in
DataFrame.astype()not propagatingattrsfrom the originalDataFrame(GH 44414)Bug in
DataFrame.convert_dtypes()result losingcolumns.names(GH 41435)Bug in constructing a
IntegerArrayfrom pyarrow data failing to validate dtypes (GH 44891)Bug in
Series.astype()not allowing converting from aPeriodDtypetodatetime64dtype, inconsistent with thePeriodIndexbehavior (GH 45038)
Strings#
Bug in checking for
string[pyarrow]dtype incorrectly raising anImportErrorwhen pyarrow is not installed (GH 44276)
Interval#
Bug in
Series.where()withIntervalDtypeincorrectly raising when thewherecall should not replace anything (GH 44181)
Indexing#
Bug in
Series.rename()withMultiIndexandlevelis provided (GH 43659)Bug in
DataFrame.truncate()andSeries.truncate()when the object’sIndexhas a length greater than one but only one unique value (GH 42365)Bug in
Series.loc()andDataFrame.loc()with aMultiIndexwhen indexing with a tuple in which one of the levels is also a tuple (GH 27591)Bug in
Series.loc()with aMultiIndexwhose first level contains onlynp.nanvalues (GH 42055)Bug in indexing on a
SeriesorDataFramewith aDatetimeIndexwhen passing a string, the return type depended on whether the index was monotonic (GH 24892)Bug in indexing on a
MultiIndexfailing to drop scalar levels when the indexer is a tuple containing a datetime-like string (GH 42476)Bug in
DataFrame.sort_values()andSeries.sort_values()when passing an ascending value, failed to raise or incorrectly raisingValueError(GH 41634)Bug in updating values of
pandas.Seriesusing boolean index, created by usingpandas.DataFrame.pop()(GH 42530)Bug in
Index.get_indexer_non_unique()when index contains multiplenp.nan(GH 35392)Bug in
DataFrame.query()did not handle the degree sign in a backticked column name, such as `Temp(°C)`, used in an expression to query aDataFrame(GH 42826)Bug in
DataFrame.drop()where the error message did not show missing labels with commas when raisingKeyError(GH 42881)Bug in
DataFrame.query()where method calls in query strings led to errors when thenumexprpackage was installed (GH 22435)Bug in
DataFrame.nlargest()andSeries.nlargest()where sorted result did not count indexes containingnp.nan(GH 28984)Bug in indexing on a non-unique object-dtype
Indexwith an NA scalar (e.g.np.nan) (GH 43711)Bug in
DataFrame.__setitem__()incorrectly writing into an existing column’s array rather than setting a new array when the new dtype and the old dtype match (GH 43406)Bug in setting floating-dtype values into a
Serieswith integer dtype failing to set inplace when those values can be losslessly converted to integers (GH 44316)Bug in
Series.__setitem__()with object dtype when setting an array with matching size and dtype=’datetime64[ns]’ or dtype=’timedelta64[ns]’ incorrectly converting the datetime/timedeltas to integers (GH 43868)Bug in
DataFrame.sort_index()whereignore_index=Truewas not being respected when the index was already sorted (GH 43591)Bug in
Index.get_indexer_non_unique()when index contains multiplenp.datetime64("NaT")andnp.timedelta64("NaT")(GH 43869)Bug in setting a scalar
Intervalvalue into aSerieswithIntervalDtypewhen the scalar’s sides are floats and the values’ sides are integers (GH 44201)Bug when setting string-backed
Categoricalvalues that can be parsed to datetimes into aDatetimeArrayorSeriesorDataFramecolumn backed byDatetimeArrayfailing to parse these strings (GH 44236)Bug in
Series.__setitem__()with an integer dtype other thanint64setting with arangeobject unnecessarily upcasting toint64(GH 44261)Bug in
Series.__setitem__()with a boolean mask indexer setting a listlike value of length 1 incorrectly broadcasting that value (GH 44265)Bug in
Series.reset_index()not ignoringnameargument whendropandinplaceare set toTrue(GH 44575)Bug in
DataFrame.loc.__setitem__()andDataFrame.iloc.__setitem__()with mixed dtypes sometimes failing to operate in-place (GH 44345)Bug in
DataFrame.loc.__getitem__()incorrectly raisingKeyErrorwhen selecting a single column with a boolean key (GH 44322).Bug in setting
DataFrame.iloc()with a singleExtensionDtypecolumn and setting 2D values e.g.df.iloc[:] = df.valuesincorrectly raising (GH 44514)Bug in setting values with
DataFrame.iloc()with a singleExtensionDtypecolumn and a tuple of arrays as the indexer (GH 44703)Bug in indexing on columns with
locorilocusing a slice with a negative step withExtensionDtypecolumns incorrectly raising (GH 44551)Bug in
DataFrame.loc.__setitem__()changing dtype when indexer was completelyFalse(GH 37550)Bug in
IntervalIndex.get_indexer_non_unique()returning boolean mask instead of array of integers for a non unique and non monotonic index (GH 44084)Bug in
IntervalIndex.get_indexer_non_unique()not handling targets ofdtype‘object’ with NaNs correctly (GH 44482)Fixed regression where a single column
np.matrixwas no longer coerced to a 1dnp.ndarraywhen added to aDataFrame(GH 42376)Bug in
Series.__getitem__()with aCategoricalIndexof integers treating lists of integers as positional indexers, inconsistent with the behavior with a single scalar integer (GH 15470, GH 14865)Bug in
Series.__setitem__()when setting floats or integers into integer-dtypeSeriesfailing to upcast when necessary to retain precision (GH 45121)Bug in
DataFrame.iloc.__setitem__()ignores axis argument (GH 45032)
Missing#
Bug in
DataFrame.fillna()withlimitand nomethodignoresaxis='columns'oraxis = 1(GH 40989, GH 17399)Bug in
DataFrame.fillna()not replacing missing values when using a dict-likevalueand duplicate column names (GH 43476)Bug in constructing a
DataFramewith a dictionarynp.datetime64as a value anddtype='timedelta64[ns]', or vice-versa, incorrectly casting instead of raising (GH 44428)Bug in
Series.interpolate()andDataFrame.interpolate()withinplace=Truenot writing to the underlying array(s) in-place (GH 44749)Bug in
Index.fillna()incorrectly returning an unfilledIndexwhen NA values are present anddowncastargument is specified. This now raisesNotImplementedErrorinstead; do not passdowncastargument (GH 44873)Bug in
DataFrame.dropna()changingIndexeven if no entries were dropped (GH 41965)Bug in
Series.fillna()with an object-dtype incorrectly ignoringdowncast="infer"(GH 44241)
MultiIndex#
Bug in
MultiIndex.get_loc()where the first level is aDatetimeIndexand a string key is passed (GH 42465)Bug in
MultiIndex.reindex()when passing alevelthat corresponds to anExtensionDtypelevel (GH 42043)Bug in
MultiIndex.get_loc()raisingTypeErrorinstead ofKeyErroron nested tuple (GH 42440)Bug in
MultiIndex.union()setting wrongsortordercausing errors in subsequent indexing operations with slices (GH 44752)Bug in
MultiIndex.putmask()where the other value was also aMultiIndex(GH 43212)Bug in
MultiIndex.dtypes()duplicate level names returned only one dtype per name (GH 45174)
I/O#
Bug in
read_excel()attempting to read chart sheets from .xlsx files (GH 41448)Bug in
json_normalize()whereerrors=ignorecould fail to ignore missing values ofmetawhenrecord_pathhas a length greater than one (GH 41876)Bug in
read_csv()with multi-header input and arguments referencing column names as tuples (GH 42446)Bug in
read_fwf(), where difference in lengths ofcolspecsandnameswas not raisingValueError(GH 40830)Bug in
Series.to_json()andDataFrame.to_json()where some attributes were skipped when serializing plain Python objects to JSON (GH 42768, GH 33043)Column headers are dropped when constructing a
DataFramefrom a sqlalchemy’sRowobject (GH 40682)Bug in unpickling an
Indexwith object dtype incorrectly inferring numeric dtypes (GH 43188)Bug in
read_csv()where reading multi-header input with unequal lengths incorrectly raisedIndexError(GH 43102)Bug in
read_csv()raisingParserErrorwhen reading file in chunks and some chunk blocks have fewer columns than header forengine="c"(GH 21211)Bug in
read_csv(), changed exception class when expecting a file path name or file-like object fromOSErrortoTypeError(GH 43366)Bug in
read_csv()andread_fwf()ignoring allskiprowsexcept first whennrowsis specified forengine='python'(GH 44021, GH 10261)Bug in
read_csv()keeping the original column in object format whenkeep_date_col=Trueis set (GH 13378)Bug in
read_json()not handling non-numpy dtypes correctly (especiallycategory) (GH 21892, GH 33205)Bug in
json_normalize()where multi-charactersepparameter is incorrectly prefixed to every key (GH 43831)Bug in
json_normalize()where reading data with missing multi-level metadata would not respecterrors="ignore"(GH 44312)Bug in
read_csv()used second row to guess implicit index ifheaderwas set toNoneforengine="python"(GH 22144)Bug in
read_csv()not recognizing bad lines whennameswere given forengine="c"(GH 22144)Bug in
read_csv()withfloat_precision="round_trip"which did not skip initial/trailing whitespace (GH 43713)Bug when Python is built without the lzma module: a warning was raised at the pandas import time, even if the lzma capability isn’t used (GH 43495)
Bug in
read_csv()not applying dtype forindex_col(GH 9435)Bug in dumping/loading a
DataFramewithyaml.dump(frame)(GH 42748)Bug in
read_csv()raisingValueErrorwhennameswas longer thanheaderbut equal to data rows forengine="python"(GH 38453)Bug in
ExcelWriter, whereengine_kwargswere not passed through to all engines (GH 43442)Bug in
read_csv()raisingValueErrorwhenparse_dateswas used withMultiIndexcolumns (GH 8991)Bug in
read_csv()not raising anValueErrorwhen\nwas specified asdelimiterorsepwhich conflicts withlineterminator(GH 43528)Bug in
to_csv()converting datetimes in categoricalSeriesto integers (GH 40754)Bug in
read_csv()converting columns to numeric after date parsing failed (GH 11019)Bug in
read_csv()not replacingNaNvalues withnp.nanbefore attempting date conversion (GH 26203)Bug in
read_csv()raisingAttributeErrorwhen attempting to read a .csv file and infer index column dtype from an nullable integer type (GH 44079)Bug in
to_csv()always coercing datetime columns with different formats to the same format (GH 21734)DataFrame.to_csv()andSeries.to_csv()withcompressionset to'zip'no longer create a zip file containing a file ending with “.zip”. Instead, they try to infer the inner file name more smartly (GH 39465)Bug in
read_csv()where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (GH 42808, GH 34120)Bug in
to_xml()raising error forpd.NAwith extension array dtype (GH 43903)Bug in
read_csv()when passing simultaneously a parser indate_parserandparse_dates=False, the parsing was still called (GH 44366)Bug in
read_csv()not setting name ofMultiIndexcolumns correctly whenindex_colis not the first column (GH 38549)Bug in
read_csv()silently ignoring errors when failing to create a memory-mapped file (GH 44766)Bug in
read_csv()when passing atempfile.SpooledTemporaryFileopened in binary mode (GH 44748)Bug in
read_json()raisingValueErrorwhen attempting to parse json strings containing “://” (GH 36271)Bug in
read_csv()whenengine="c"andencoding_errors=Nonewhich caused a segfault (GH 45180)Bug in
read_csv()an invalid value ofusecolsleading to an unclosed file handle (GH 45384)Bug in
DataFrame.to_json()fix memory leak (GH 43877)
Period#
Bug in adding a
Periodobject to anp.timedelta64object incorrectly raisingTypeError(GH 44182)Bug in
PeriodIndex.to_timestamp()when the index hasfreq="B"inferringfreq="D"for its result instead offreq="B"(GH 44105)Bug in
Periodconstructor incorrectly allowingnp.timedelta64("NaT")(GH 44507)Bug in
PeriodIndex.to_timestamp()giving incorrect values for indexes with non-contiguous data (GH 44100)Bug in
Series.where()withPeriodDtypeincorrectly raising when thewherecall should not replace anything (GH 45135)
Plotting#
When given non-numeric data,
DataFrame.boxplot()now raises aValueErrorrather than a crypticKeyErrororZeroDivisionError, in line with other plotting functions likeDataFrame.hist()(GH 43480)
Groupby/resample/rolling#
Bug in
SeriesGroupBy.apply()where passing an unrecognized string argument failed to raiseTypeErrorwhen the underlyingSeriesis empty (GH 42021)Bug in
Series.rolling.apply(),DataFrame.rolling.apply(),Series.expanding.apply()andDataFrame.expanding.apply()withengine="numba"where*argswere being cached with the user passed function (GH 42287)Bug in
DataFrameGroupBy.max(),SeriesGroupBy.max(),DataFrameGroupBy.min(), andSeriesGroupBy.min()with nullable integer dtypes losing precision (GH 41743)Bug in
DataFrame.groupby.rolling.var()would calculate the rolling variance only on the first group (GH 42442)Bug in
DataFrameGroupBy.shift()andSeriesGroupBy.shift()that would return the grouping columns iffill_valuewas notNone(GH 41556)Bug in
SeriesGroupBy.nlargest()andSeriesGroupBy.nsmallest()would have an inconsistent index when the inputSerieswas sorted andnwas greater than or equal to all group sizes (GH 15272, GH 16345, GH 29129)Bug in
pandas.DataFrame.ewm(), where non-float64 dtypes were silently failing (GH 42452)Bug in
pandas.DataFrame.rolling()operation along rows (axis=1) incorrectly omits columns containingfloat16andfloat32(GH 41779)Bug in
Resampler.aggregate()did not allow the use of Named Aggregation (GH 32803)Bug in
Series.rolling()when theSeriesdtypewasInt64(GH 43016)Bug in
DataFrame.rolling.corr()when theDataFramecolumns was aMultiIndex(GH 21157)Bug in
DataFrame.groupby.rolling()when specifyingonand calling__getitem__would subsequently return incorrect results (GH 43355)Bug in
DataFrameGroupBy.apply()andSeriesGroupBy.apply()with time-basedGrouperobjects incorrectly raisingValueErrorin corner cases where the grouping vector contains aNaT(GH 43500, GH 43515)Bug in
DataFrameGroupBy.mean()andSeriesGroupBy.mean()failing withcomplexdtype (GH 43701)Bug in
Series.rolling()andDataFrame.rolling()not calculating window bounds correctly for the first row whencenter=Trueand index is decreasing (GH 43927)Bug in
Series.rolling()andDataFrame.rolling()for centered datetimelike windows with uneven nanosecond (GH 43997)Bug in
DataFrameGroupBy.mean()andSeriesGroupBy.mean()raisingKeyErrorwhen column was selected at least twice (GH 44924)Bug in
DataFrameGroupBy.nth()andSeriesGroupBy.nth()failing onaxis=1(GH 43926)Bug in
Series.rolling()andDataFrame.rolling()not respecting right bound on centered datetime-like windows, if the index contain duplicates (GH 3944)Bug in
Series.rolling()andDataFrame.rolling()when using apandas.api.indexers.BaseIndexersubclass that returned unequal start and end arrays would segfault instead of raising aValueError(GH 44470)Bug in
Groupby.nunique()not respectingobserved=Trueforcategoricalgrouping columns (GH 45128)Bug in
DataFrameGroupBy.head(),SeriesGroupBy.head(),DataFrameGroupBy.tail(), andSeriesGroupBy.tail()not dropping groups withNaNwhendropna=True(GH 45089)Bug in
GroupBy.__iter__()after selecting a subset of columns in aGroupByobject, which returned all columns instead of the chosen subset (GH 44821)Bug in
Groupby.rolling()when non-monotonic data passed, fails to correctly raiseValueError(GH 43909)Bug where grouping by a
Seriesthat has acategoricaldata type and length unequal to the axis of grouping raisedValueError(GH 44179)
Reshaping#
Improved error message when creating a
DataFramecolumn from a multi-dimensionalnumpy.ndarray(GH 42463)Bug in
concat()creatingMultiIndexwith duplicate level entries when concatenating aDataFramewith duplicates inIndexand multiple keys (GH 42651)Bug in
pandas.cut()onSerieswith duplicate indices and non-exactpandas.CategoricalIndex()(GH 42185, GH 42425)Bug in
DataFrame.append()failing to retain dtypes when appended columns do not match (GH 43392)Bug in
concat()ofboolandbooleandtypes resulting inobjectdtype instead ofbooleandtype (GH 42800)Bug in
crosstab()when inputs are categoricalSeries, there are categories that are not present in one or both of theSeries, andmargins=True. Previously the margin value for missing categories wasNaN. It is now correctly reported as 0 (GH 43505)Bug in
concat()would fail when theobjsargument all had the same index and thekeysargument contained duplicates (GH 43595)Bug in
merge()withMultiIndexas column index for theonargument returning an error when assigning a column internally (GH 43734)Bug in
crosstab()would fail when inputs are lists or tuples (GH 44076)Bug in
DataFrame.append()failing to retainindex.namewhen appending a list ofSeriesobjects (GH 44109)Fixed metadata propagation in
Dataframe.apply()method, consequently fixing the same issue forDataframe.transform(),Dataframe.nunique()andDataframe.mode()(GH 28283)Bug in
concat()casting levels ofMultiIndexto float if all levels only consist of missing values (GH 44900)Bug in
DataFrame.stack()withExtensionDtypecolumns incorrectly raising (GH 43561)Bug in
merge()raisingKeyErrorwhen joining over differently named indexes with on keywords (GH 45094)Bug in
Series.unstack()with object doing unwanted type inference on resulting columns (GH 44595)Bug in
MultiIndex.join()with overlappingIntervalIndexlevels (GH 44096)Bug in
DataFrame.replace()andSeries.replace()results is differentdtypebased onregexparameter (GH 44864)Bug in
DataFrame.pivot()withindex=Nonewhen theDataFrameindex was aMultiIndex(GH 23955)
Sparse#
Bug in
DataFrame.sparse.to_coo()raisingAttributeErrorwhen column names are not unique (GH 29564)Bug in
SparseArray.max()andSparseArray.min()raisingValueErrorfor arrays with 0 non-null elements (GH 43527)Bug in
DataFrame.sparse.to_coo()silently converting non-zero fill values to zero (GH 24817)Bug in
SparseArraycomparison methods with an array-like operand of mismatched length raisingAssertionErroror unclearValueErrordepending on the input (GH 43863)Bug in
SparseArrayarithmetic methodsfloordivandmodbehaviors when dividing by zero not matching the non-sparseSeriesbehavior (GH 38172)Bug in
SparseArrayunary methods as well asSparseArray.isna()doesn’t recalculate indexes (GH 44955)
ExtensionArray#
NumPy ufuncs
np.abs,np.positive,np.negativenow correctly preserve dtype when called on ExtensionArrays that implement__abs__, __pos__, __neg__, respectively. In particular this is fixed forTimedeltaArray(GH 43899, GH 23316)NumPy ufuncs
np.minimum.reducenp.maximum.reduce,np.add.reduce, andnp.prod.reducenow work correctly instead of raisingNotImplementedErroronSerieswithIntegerDtypeorFloatDtype(GH 43923, GH 44793)NumPy ufuncs with
outkeyword are now supported by arrays withIntegerDtypeandFloatingDtype(GH 45122)Avoid raising
PerformanceWarningabout fragmentedDataFramewhen using many columns with an extension dtype (GH 44098)Bug in
IntegerArrayandFloatingArrayconstruction incorrectly coercing mismatched NA values (e.g.np.timedelta64("NaT")) to numeric NA (GH 44514)Bug in
BooleanArray.__eq__()andBooleanArray.__ne__()raisingTypeErroron comparison with an incompatible type (like a string). This causedDataFrame.replace()to sometimes raise aTypeErrorif a nullable boolean column was included (GH 44499)Bug in
array()incorrectly raising when passed andarraywithfloat16dtype (GH 44715)Bug in calling
np.sqrtonBooleanArrayreturning a malformedFloatingArray(GH 44715)Bug in
Series.where()withExtensionDtypewhenotheris a NA scalar incompatible with theSeriesdtype (e.g.NaTwith a numeric dtype) incorrectly casting to a compatible NA value (GH 44697)Bug in
Series.replace()where explicitly passingvalue=Noneis treated as if novaluewas passed, andNonenot being in the result (GH 36984, GH 19998)Bug in
Series.replace()with unwanted downcasting being done in no-op replacements (GH 44498)Bug in
Series.replace()withFloatDtype,string[python], orstring[pyarrow]dtype not being preserved when possible (GH 33484, GH 40732, GH 31644, GH 41215, GH 25438)
Styler#
Bug in
Stylerwhere theuuidat initialization maintained a floating underscore (GH 43037)Bug in
Styler.to_html()where theStylerobject was updated if theto_htmlmethod was called with some args (GH 43034)Bug in
Styler.copy()whereuuidwas not previously copied (GH 40675)Bug in
Styler.apply()where functions which returnedSeriesobjects were not correctly handled in terms of aligning their index labels (GH 13657, GH 42014)Bug when rendering an empty
DataFramewith a namedIndex(GH 43305)Bug when rendering a single level
MultiIndex(GH 43383)Bug when combining non-sparse rendering and
Styler.hide_columns()orStyler.hide_index()(GH 43464)Bug setting a table style when using multiple selectors in
Styler(GH 44011)Bugs where row trimming and column trimming failed to reflect hidden rows (GH 43703, GH 44247)
Other#
Bug in
DataFrame.astype()with non-unique columns and aSeriesdtypeargument (GH 44417)Bug in
CustomBusinessMonthBegin.__add__()(CustomBusinessMonthEnd.__add__()) not applying the extraoffsetparameter when beginning (end) of the target month is already a business day (GH 41356)Bug in
RangeIndex.union()with anotherRangeIndexwith matching (even)stepand starts differing by strictly less thanstep / 2(GH 44019)Bug in
RangeIndex.difference()withsort=Noneandstep<0failing to sort (GH 44085)Bug in
Series.replace()andDataFrame.replace()withvalue=Noneand ExtensionDtypes (GH 44270, GH 37899)Bug in
FloatingArray.equals()failing to consider two arrays equal if they containnp.nanvalues (GH 44382)Bug in
DataFrame.shift()withaxis=1andExtensionDtypecolumns incorrectly raising when an incompatiblefill_valueis passed (GH 44564)Bug in
DataFrame.shift()withaxis=1andperiodslarger thanlen(frame.columns)producing an invalidDataFrame(GH 44978)Bug in
DataFrame.diff()when passing a NumPy integer object instead of anintobject (GH 44572)Bug in
Series.replace()raisingValueErrorwhen usingregex=Truewith aSeriescontainingnp.nanvalues (GH 43344)Bug in
DataFrame.to_records()where an incorrectnwas used when missing names were replaced bylevel_n(GH 44818)Bug in
DataFrame.eval()whereresolversargument was overriding the default resolvers (GH 34966)Series.__repr__()andDataFrame.__repr__()no longer replace all null-values in indexes with “NaN” but use their real string-representations. “NaN” is used only forfloat("nan")(GH 45263)
Contributors#
A total of 275 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Abhishek R
Albert Villanova del Moral
Alessandro Bisiani +
Alex Lim
Alex-Gregory-1 +
Alexander Gorodetsky
Alexander Regueiro +
Alexey Györi
Alexis Mignon
Aleš Erjavec
Ali McMaster
Alibi +
Andrei Batomunkuev +
Andrew Eckart +
Andrew Hawyrluk
Andrew Wood
Anton Lodder +
Armin Berres +
Arushi Sharma +
Benedikt Heidrich +
Beni Bienz +
Benoît Vinot
Bert Palm +
Boris Rumyantsev +
Brian Hulette
Brock
Bruno Costa +
Bryan Racic +
Caleb Epstein
Calvin Ho
ChristofKaufmann +
Christopher Yeh +
Chuliang Xiao +
ClaudiaSilver +
DSM
Daniel Coll +
Daniel Schmidt +
Dare Adewumi
David +
David Sanders +
David Wales +
Derzan Chiang +
DeviousLab +
Dhruv B Shetty +
Digres45 +
Dominik Kutra +
Drew Levitt +
DriesS
EdAbati
Elle
Elliot Rampono
Endre Mark Borza
Erfan Nariman
Evgeny Naumov +
Ewout ter Hoeven +
Fangchen Li
Felix Divo
Felix Dulys +
Francesco Andreuzzi +
Francois Dion +
Frans Larsson +
Fred Reiss
GYvan
Gabriel Di Pardi Arruda +
Gesa Stupperich
Giacomo Caria +
Greg Siano +
Griffin Ansel
Hiroaki Ogasawara +
Horace +
Horace Lai +
Irv Lustig
Isaac Virshup
JHM Darbyshire (MBP)
JHM Darbyshire (iMac)
JHM Darbyshire +
Jack Liu
Jacob Skwirsk +
Jaime Di Cristina +
James Holcombe +
Janosh Riebesell +
Jarrod Millman
Jason Bian +
Jeff Reback
Jernej Makovsek +
Jim Bradley +
Joel Gibson +
Joeperdefloep +
Johannes Mueller +
John S Bogaardt +
John Zangwill +
Jon Haitz Legarreta Gorroño +
Jon Wiggins +
Jonas Haag +
Joris Van den Bossche
Josh Friedlander
José Duarte +
Julian Fleischer +
Julien de la Bruère-T
Justin McOmie
Kadatatlu Kishore +
Kaiqi Dong
Kashif Khan +
Kavya9986 +
Kendall +
Kevin Sheppard
Kiley Hewitt
Koen Roelofs +
Krishna Chivukula
KrishnaSai2020
Leonardo Freua +
Leonardus Chen
Liang-Chi Hsieh +
Loic Diridollou +
Lorenzo Maffioli +
Luke Manley +
LunarLanding +
Marc Garcia
Marcel Bittar +
Marcel Gerber +
Marco Edward Gorelli
Marco Gorelli
MarcoGorelli
Marvin +
Mateusz Piotrowski +
Mathias Hauser +
Matt Richards +
Matthew Davis +
Matthew Roeschke
Matthew Zeitlin
Matthias Bussonnier
Matti Picus
Mauro Silberberg +
Maxim Ivanov
Maximilian Carr +
MeeseeksMachine
Michael Sarrazin +
Michael Wang +
Michał Górny +
Mike Phung +
Mike Taves +
Mohamad Hussein Rkein +
NJOKU OKECHUKWU VALENTINE +
Neal McBurnett +
Nick Anderson +
Nikita Sobolev +
Olivier Cavadenti +
PApostol +
Pandas Development Team
Patrick Hoefler
Peter
Peter Tillmann +
Prabha Arivalagan +
Pradyumna Rahul
Prerana Chakraborty
Prithvijit +
Rahul Gaikwad +
Ray Bell
Ricardo Martins +
Richard Shadrach
Robbert-jan ‘t Hoen +
Robert Voyer +
Robin Raymond +
Rohan Sharma +
Rohan Sirohia +
Roman Yurchak
Ruan Pretorius +
Sam James +
Scott Talbert
Shashwat Sharma +
Sheogorath27 +
Shiv Gupta
Shoham Debnath
Simon Hawkins
Soumya +
Stan West +
Stefanie Molin +
Stefano Alberto Russo +
Stephan Heßelmann
Stephen
Suyash Gupta +
Sven
Swanand01 +
Sylvain Marié +
TLouf
Tania Allard +
Terji Petersen
TheDerivator +
Thomas Dickson
Thomas Kastl +
Thomas Kluyver
Thomas Li
Thomas Smith
Tim Swast
Tim Tran +
Tobias McNulty +
Tobias Pitters
Tomoki Nakagawa +
Tony Hirst +
Torsten Wörtwein
V.I. Wood +
Vaibhav K +
Valentin Oliver Loftsson +
Varun Shrivastava +
Vivek Thazhathattil +
Vyom Pathak
Wenjun Si
William Andrea +
William Bradley +
Wojciech Sadowski +
Yao-Ching Huang +
Yash Gupta +
Yiannis Hadjicharalambous +
Yoshiki Vázquez Baeza
Yuanhao Geng
Yury Mikhaylov
Yvan Gatete +
Yves Delley +
Zach Rait
Zbyszek Królikowski +
Zero +
Zheyuan
Zhiyi Wu +
aiudirog
ali sayyah +
aneesh98 +
aptalca
arw2019 +
attack68
brendandrury +
bubblingoak +
calvinsomething +
claws +
deponovo +
dicristina
el-g-1 +
evensure +
fotino21 +
fshi01 +
gfkang +
github-actions[bot]
i-aki-y
jbrockmendel
jreback
juliandwain +
jxb4892 +
kendall smith +
lmcindewar +
lrepiton
maximilianaccardo +
michal-gh
neelmraman
partev
phofl +
pratyushsharan +
quantumalaviya +
rafael +
realead
rocabrera +
rosagold
saehuihwang +
salomondush +
shubham11941140 +
srinivasan +
stphnlyd
suoniq
trevorkask +
tushushu
tyuyoshi +
usersblock +
vernetya +
vrserpa +
willie3838 +
zeitlinv +
zhangxiaoxing +