What’s new in 1.3.0 (July 2, 2021)#
These are the changes in pandas 1.3.0. See Release notes for a full changelog including other versions of pandas.
Warning
When reading new Excel 2007+ (.xlsx) files, the default argument
engine=None to read_excel() will now result in using the
openpyxl engine in all cases
when the option io.excel.xlsx.reader is set to "auto".
Previously, some cases would use the
xlrd engine instead. See
What’s new 1.2.0 for background on this change.
Enhancements#
Custom HTTP(s) headers when reading csv or json files#
When reading from a remote URL that is not handled by fsspec (e.g. HTTP and
HTTPS) the dictionary passed to storage_options will be used to create the
headers included in the request. This can be used to control the User-Agent
header or send other custom headers (GH 36688).
For example:
In [1]: headers = {"User-Agent": "pandas"}
In [2]: df = pd.read_csv(
...: "https://download.bls.gov/pub/time.series/cu/cu.item",
...: sep="\t",
...: storage_options=headers
...: )
Read and write XML documents#
We added I/O support to read and render shallow versions of XML documents with
read_xml() and DataFrame.to_xml(). Using lxml as parser,
both XPath 1.0 and XSLT 1.0 are available. (GH 27554)
In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
...: <data>
...: <row>
...: <shape>square</shape>
...: <degrees>360</degrees>
...: <sides>4.0</sides>
...: </row>
...: <row>
...: <shape>circle</shape>
...: <degrees>360</degrees>
...: <sides/>
...: </row>
...: <row>
...: <shape>triangle</shape>
...: <degrees>180</degrees>
...: <sides>3.0</sides>
...: </row>
...: </data>"""
In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
shape degrees sides
0 square 360 4.0
1 circle 360 NaN
2 triangle 180 3.0
In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
<row>
<index>0</index>
<shape>square</shape>
<degrees>360</degrees>
<sides>4.0</sides>
</row>
<row>
<index>1</index>
<shape>circle</shape>
<degrees>360</degrees>
<sides/>
</row>
<row>
<index>2</index>
<shape>triangle</shape>
<degrees>180</degrees>
<sides>3.0</sides>
</row>
</data>
For more, see Writing XML in the user guide on IO tools.
Styler enhancements#
We provided some focused development on Styler. See also the Styler documentation
which has been revised and improved (GH 39720, GH 39317, GH 40493).
The method
Styler.set_table_styles()can now accept more natural CSS language for arguments, such as'color:red;'instead of[('color', 'red')](GH 39563)The methods
Styler.highlight_null(),Styler.highlight_min(), andStyler.highlight_max()now allow custom CSS highlighting instead of the default background coloring (GH 40242)
Styler.apply()now accepts functions that return anndarraywhenaxis=None, making it now consistent with theaxis=0andaxis=1behavior (GH 39359)When incorrectly formatted CSS is given via
Styler.apply()orStyler.applymap(), an error is now raised upon rendering (GH 39660)
Styler.format()now accepts the keyword argumentescapefor optional HTML and LaTeX escaping (GH 40388, GH 41619)
Styler.background_gradient()has gained the argumentgmapto supply a specific gradient map for shading (GH 22727)
Styler.clear()now clearsStyler.hidden_indexandStyler.hidden_columnsas well (GH 40484)Added the method
Styler.highlight_between()(GH 39821)Added the method
Styler.highlight_quantile()(GH 40926)Added the method
Styler.text_gradient()(GH 41098)Added the method
Styler.set_tooltips()to allow hover tooltips; this can be used enhance interactive displays (GH 21266, GH 40284)Added the parameter
precisionto the methodStyler.format()to control the display of floating point numbers (GH 40134)
Stylerrendered HTML output now follows the w3 HTML Style Guide (GH 39626)Many features of the
Stylerclass are now either partially or fully usable on a DataFrame with a non-unique indexes or columns (GH 41143)One has greater control of the display through separate sparsification of the index or columns using the new styler options, which are also usable via
option_context()(GH 41142)Added the option
styler.render.max_elementsto avoid browser overload when styling large DataFrames (GH 40712)Added the method
Styler.to_latex()(GH 21673, GH 42320), which also allows some limited CSS conversion (GH 40731)Added the method
Styler.to_html()(GH 13379)Added the method
Styler.set_sticky()to make index and column headers permanently visible in scrolling HTML frames (GH 29072)
DataFrame constructor honors copy=False with dict#
When passing a dictionary to DataFrame with copy=False,
a copy will no longer be made (GH 32960).
In [1]: arr = np.array([1, 2, 3])
In [2]: df = pd.DataFrame({"A": arr, "B": arr.copy()}, copy=False)
In [3]: df
Out[3]:
A B
0 1 1
1 2 2
2 3 3
df["A"] remains a view on arr:
In [4]: arr[0] = 0
In [5]: assert df.iloc[0, 0] == 0
The default behavior when not passing copy will remain unchanged, i.e.
a copy will be made.
PyArrow backed string data type#
We’ve enhanced the StringDtype, an extension type dedicated to string data.
(GH 39908)
It is now possible to specify a storage keyword option to StringDtype. Use
pandas options or specify the dtype using dtype='string[pyarrow]' to allow the
StringArray to be backed by a PyArrow array instead of a NumPy array of Python objects.
The PyArrow backed StringArray requires pyarrow 1.0.0 or greater to be installed.
Warning
string[pyarrow] is currently considered experimental. The implementation
and parts of the API may change without warning.
In [6]: pd.Series(['abc', None, 'def'], dtype=pd.StringDtype(storage="pyarrow"))
Out[6]:
0 abc
1 <NA>
2 def
dtype: string
You can use the alias "string[pyarrow]" as well.
In [7]: s = pd.Series(['abc', None, 'def'], dtype="string[pyarrow]")
In [8]: s
Out[8]:
0 abc
1 <NA>
2 def
dtype: string
You can also create a PyArrow backed string array using pandas options.
In [9]: with pd.option_context("string_storage", "pyarrow"):
...: s = pd.Series(['abc', None, 'def'], dtype="string")
...:
In [10]: s
Out[10]:
0 abc
1 <NA>
2 def
dtype: string
The usual string accessor methods work. Where appropriate, the return type of the Series or columns of a DataFrame will also have string dtype.
In [11]: s.str.upper()
Out[11]:
0 ABC
1 <NA>
2 DEF
dtype: string
In [12]: s.str.split('b', expand=True).dtypes
Out[12]:
0 string[pyarrow]
1 string[pyarrow]
dtype: object
String accessor methods returning integers will return a value with Int64Dtype
In [13]: s.str.count("a")
Out[13]:
0 1
1 <NA>
2 0
dtype: Int64
Centered datetime-like rolling windows#
When performing rolling calculations on DataFrame and Series objects with a datetime-like index, a centered datetime-like window can now be used (GH 38780). For example:
In [14]: df = pd.DataFrame(
....: {"A": [0, 1, 2, 3, 4]}, index=pd.date_range("2020", periods=5, freq="1D")
....: )
....:
In [15]: df
Out[15]:
A
2020-01-01 0
2020-01-02 1
2020-01-03 2
2020-01-04 3
2020-01-05 4
In [16]: df.rolling("2D", center=True).mean()
Out[16]:
A
2020-01-01 0.5
2020-01-02 1.5
2020-01-03 2.5
2020-01-04 3.5
2020-01-05 4.0
Other enhancements#
DataFrame.rolling(),Series.rolling(),DataFrame.expanding(), andSeries.expanding()now support amethodargument with a'table'option that performs the windowing operation over an entireDataFrame. See Window Overview for performance and functional benefits (GH 15095, GH 38995)ExponentialMovingWindownow support aonlinemethod that can performmeancalculations in an online fashion. See Window Overview (GH 41673)Added
MultiIndex.dtypes()(GH 37062)Added
endandend_dayoptions for theoriginargument inDataFrame.resample()(GH 37804)Improved error message when
usecolsandnamesdo not match forread_csv()andengine="c"(GH 29042)Improved consistency of error messages when passing an invalid
win_typeargument in Window methods (GH 15969)read_sql_query()now accepts adtypeargument to cast the columnar data from the SQL database based on user input (GH 10285)read_csv()now raisingParserWarningif length of header or given names does not match length of data whenusecolsis not specified (GH 21768)Improved integer type mapping from pandas to SQLAlchemy when using
DataFrame.to_sql()(GH 35076)to_numeric()now supports downcasting of nullableExtensionDtypeobjects (GH 33013)Added support for dict-like names in
MultiIndex.set_namesandMultiIndex.rename(GH 20421)read_excel()can now auto-detect .xlsb files and older .xls files (GH 35416, GH 41225)ExcelWriternow accepts anif_sheet_existsparameter to control the behavior of append mode when writing to existing sheets (GH 40230)Rolling.sum(),Expanding.sum(),Rolling.mean(),Expanding.mean(),ExponentialMovingWindow.mean(),Rolling.median(),Expanding.median(),Rolling.max(),Expanding.max(),Rolling.min(), andExpanding.min()now support Numba execution with theenginekeyword (GH 38895, GH 41267)DataFrame.apply()can now accept NumPy unary operators as strings, e.g.df.apply("sqrt"), which was already the case forSeries.apply()(GH 39116)DataFrame.apply()can now accept non-callable DataFrame properties as strings, e.g.df.apply("size"), which was already the case forSeries.apply()(GH 39116)DataFrame.applymap()can now accept kwargs to pass on to the user-providedfunc(GH 39987)Passing a
DataFrameindexer toilocis now disallowed forSeries.__getitem__()andDataFrame.__getitem__()(GH 39004)Series.apply()can now accept list-like or dictionary-like arguments that aren’t lists or dictionaries, e.g.ser.apply(np.array(["sum", "mean"])), which was already the case forDataFrame.apply()(GH 39140)DataFrame.plot.scatter()can now accept a categorical column for the argumentc(GH 12380, GH 31357)Series.loc()now raises a helpful error message when the Series has aMultiIndexand the indexer has too many dimensions (GH 35349)read_stata()now supports reading data from compressed files (GH 26599)Added support for parsing
ISO 8601-like timestamps with negative signs toTimedelta(GH 37172)Added support for unary operators in
FloatingArray(GH 38749)RangeIndexcan now be constructed by passing arangeobject directly e.g.pd.RangeIndex(range(3))(GH 12067)Series.round()andDataFrame.round()now work with nullable integer and floating dtypes (GH 38844)read_csv()andread_json()expose the argumentencoding_errorsto control how encoding errors are handled (GH 39450)DataFrameGroupBy.any(),SeriesGroupBy.any(),DataFrameGroupBy.all(), andSeriesGroupBy.all()use Kleene logic with nullable data types (GH 37506)DataFrameGroupBy.any(),SeriesGroupBy.any(),DataFrameGroupBy.all(), andSeriesGroupBy.all()return aBooleanDtypefor columns with nullable data types (GH 33449)DataFrameGroupBy.any(),SeriesGroupBy.any(),DataFrameGroupBy.all(), andSeriesGroupBy.all()raising withobjectdata containingpd.NAeven whenskipna=True(GH 37501)DataFrameGroupBy.rank()andSeriesGroupBy.rank()now supports object-dtype data (GH 38278)Constructing a
DataFrameorSerieswith thedataargument being a Python iterable that is not a NumPyndarrayconsisting of NumPy scalars will now result in a dtype with a precision the maximum of the NumPy scalars; this was already the case whendatais a NumPyndarray(GH 40908)Add keyword
sorttopivot_table()to allow non-sorting of the result (GH 39143)Add keyword
dropnatoDataFrame.value_counts()to allow counting rows that includeNAvalues (GH 41325)Series.replace()will now cast results toPeriodDtypewhere possible instead ofobjectdtype (GH 41526)Improved error message in
corrandcovmethods onRolling,Expanding, andExponentialMovingWindowwhenotheris not aDataFrameorSeries(GH 41741)Series.between()can now acceptleftorrightas arguments toinclusiveto include only the left or right boundary (GH 40245)DataFrame.explode()now supports exploding multiple columns. Itscolumnargument now also accepts a list of str or tuples for exploding on multiple columns at the same time (GH 39240)DataFrame.sample()now accepts theignore_indexargument to reset the index after sampling, similar toDataFrame.drop_duplicates()andDataFrame.sort_values()(GH 38581)
Notable bug fixes#
These are bug fixes that might have notable behavior changes.
Categorical.unique now always maintains same dtype as original#
Previously, when calling Categorical.unique() with categorical data, unused categories in the new array
would be removed, making the dtype of the new array different than the
original (GH 18291)
As an example of this, given:
In [17]: dtype = pd.CategoricalDtype(['bad', 'neutral', 'good'], ordered=True)
In [18]: cat = pd.Categorical(['good', 'good', 'bad', 'bad'], dtype=dtype)
In [19]: original = pd.Series(cat)
In [20]: unique = original.unique()
Previous behavior:
In [1]: unique
['good', 'bad']
Categories (2, object): ['bad' < 'good']
In [2]: original.dtype == unique.dtype
False
New behavior:
In [21]: unique
Out[21]:
['good', 'bad']
Categories (3, object): ['bad' < 'neutral' < 'good']
In [22]: original.dtype == unique.dtype
Out[22]: True
Preserve dtypes in DataFrame.combine_first()#
DataFrame.combine_first() will now preserve dtypes (GH 7509)
In [23]: df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
In [24]: df1
Out[24]:
A B
0 1 1
1 2 2
2 3 3
In [25]: df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
In [26]: df2
Out[26]:
B C
2 4 1
3 5 2
4 6 3
In [27]: combined = df1.combine_first(df2)
Previous behavior:
In [1]: combined.dtypes
Out[2]:
A float64
B float64
C float64
dtype: object
New behavior:
In [28]: combined.dtypes
Out[28]:
A float64
B int64
C float64
dtype: object
Groupby methods agg and transform no longer changes return dtype for callables#
Previously the methods DataFrameGroupBy.aggregate(),
SeriesGroupBy.aggregate(), DataFrameGroupBy.transform(), and
SeriesGroupBy.transform() might cast the result dtype when the argument func
is callable, possibly leading to undesirable results (GH 21240). The cast would
occur if the result is numeric and casting back to the input dtype does not change any
values as measured by np.allclose. Now no such casting occurs.
In [29]: df = pd.DataFrame({'key': [1, 1], 'a': [True, False], 'b': [True, True]})
In [30]: df
Out[30]:
key a b
0 1 True True
1 1 False True
Previous behavior:
In [5]: df.groupby('key').agg(lambda x: x.sum())
Out[5]:
a b
key
1 True 2
New behavior:
In [31]: df.groupby('key').agg(lambda x: x.sum())
Out[31]:
a b
key
1 1 2
float result for DataFrameGroupBy.mean(), DataFrameGroupBy.median(), and GDataFrameGroupBy.var(), SeriesGroupBy.mean(), SeriesGroupBy.median(), and SeriesGroupBy.var()#
Previously, these methods could result in different dtypes depending on the input values. Now, these methods will always return a float dtype. (GH 41137)
In [32]: df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
Previous behavior:
In [5]: df.groupby(df.index).mean()
Out[5]:
a b c
0 True 1 1.0
New behavior:
In [33]: df.groupby(df.index).mean()
Out[33]:
a b c
0 1.0 1.0 1.0
Try operating inplace when setting values with loc and iloc#
When setting an entire column using loc or iloc, pandas will try to
insert the values into the existing data rather than create an entirely new array.
In [34]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [35]: values = df.values
In [36]: new = np.array([5, 6, 7], dtype="int64")
In [37]: df.loc[[0, 1, 2], "A"] = new
In both the new and old behavior, the data in values is overwritten, but in
the old behavior the dtype of df["A"] changed to int64.
Previous behavior:
In [1]: df.dtypes
Out[1]:
A int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False
In pandas 1.3.0, df continues to share data with values
New behavior:
In [38]: df.dtypes
Out[38]:
A float64
dtype: object
In [39]: np.shares_memory(df["A"], new)
Out[39]: False
In [40]: np.shares_memory(df["A"], values)
Out[40]: True
Never operate inplace when setting frame[keys] = values#
When setting multiple columns using frame[keys] = values new arrays will
replace pre-existing arrays for these keys, which will not be over-written
(GH 39510). As a result, the columns will retain the dtype(s) of values,
never casting to the dtypes of the existing arrays.
In [41]: df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
In [42]: df[["A"]] = 5
In the old behavior, 5 was cast to float64 and inserted into the existing
array backing df:
Previous behavior:
In [1]: df.dtypes
Out[1]:
A float64
In the new behavior, we get a new array, and retain an integer-dtyped 5:
New behavior:
In [43]: df.dtypes
Out[43]:
A int64
dtype: object
Consistent casting with setting into Boolean Series#
Setting non-boolean values into a Series with dtype=bool now consistently
casts to dtype=object (GH 38709)
In [1]: orig = pd.Series([True, False])
In [2]: ser = orig.copy()
In [3]: ser.iloc[1] = np.nan
In [4]: ser2 = orig.copy()
In [5]: ser2.iloc[1] = 2.0
Previous behavior:
In [1]: ser
Out [1]:
0 1.0
1 NaN
dtype: float64
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
New behavior:
In [1]: ser
Out [1]:
0 True
1 NaN
dtype: object
In [2]:ser2
Out [2]:
0 True
1 2.0
dtype: object
DataFrameGroupBy.rolling and SeriesGroupBy.rolling no longer return grouped-by column in values#
The group-by column will now be dropped from the result of a
groupby.rolling operation (GH 32262)
In [44]: df = pd.DataFrame({"A": [1, 1, 2, 3], "B": [0, 1, 2, 3]})
In [45]: df
Out[45]:
A B
0 1 0
1 1 1
2 2 2
3 3 3
Previous behavior:
In [1]: df.groupby("A").rolling(2).sum()
Out[1]:
A B
A
1 0 NaN NaN
1 2.0 1.0
2 2 NaN NaN
3 3 NaN NaN
New behavior:
In [46]: df.groupby("A").rolling(2).sum()
Out[46]:
B
A
1 0 NaN
1 1.0
2 2 NaN
3 3 NaN
Removed artificial truncation in rolling variance and standard deviation#
Rolling.std() and Rolling.var() will no longer
artificially truncate results that are less than ~1e-8 and ~1e-15 respectively to
zero (GH 37051, GH 40448, GH 39872).
However, floating point artifacts may now exist in the results when rolling over larger values.
In [47]: s = pd.Series([7, 5, 5, 5])
In [48]: s.rolling(3).var()
Out[48]:
0 NaN
1 NaN
2 1.333333
3 0.000000
dtype: float64
DataFrameGroupBy.rolling and SeriesGroupBy.rolling with MultiIndex no longer drop levels in the result#
DataFrameGroupBy.rolling() and SeriesGroupBy.rolling() will no longer drop levels of a DataFrame
with a MultiIndex in the result. This can lead to a perceived duplication of levels in the resulting
MultiIndex, but this change restores the behavior that was present in version 1.1.3 (GH 38787, GH 38523).
In [49]: index = pd.MultiIndex.from_tuples([('idx1', 'idx2')], names=['label1', 'label2'])
In [50]: df = pd.DataFrame({'a': [1], 'b': [2]}, index=index)
In [51]: df
Out[51]:
a b
label1 label2
idx1 idx2 1 2
Previous behavior:
In [1]: df.groupby('label1').rolling(1).sum()
Out[1]:
a b
label1
idx1 1.0 2.0
New behavior:
In [52]: df.groupby('label1').rolling(1).sum()
Out[52]:
a b
label1 label1 label2
idx1 idx1 idx2 1.0 2.0
Backwards incompatible API changes#
Increased minimum versions for dependencies#
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package |
Minimum Version |
Required |
Changed |
|---|---|---|---|
numpy |
1.17.3 |
X |
X |
pytz |
2017.3 |
X |
|
python-dateutil |
2.7.3 |
X |
|
bottleneck |
1.2.1 |
||
numexpr |
2.7.0 |
X |
|
pytest (dev) |
6.0 |
X |
|
mypy (dev) |
0.812 |
X |
|
setuptools |
38.6.0 |
X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package |
Minimum Version |
Changed |
|---|---|---|
beautifulsoup4 |
4.6.0 |
|
fastparquet |
0.4.0 |
X |
fsspec |
0.7.4 |
|
gcsfs |
0.6.0 |
|
lxml |
4.3.0 |
|
matplotlib |
2.2.3 |
|
numba |
0.46.0 |
|
openpyxl |
3.0.0 |
X |
pyarrow |
0.17.0 |
X |
pymysql |
0.8.1 |
X |
pytables |
3.5.1 |
|
s3fs |
0.4.0 |
|
scipy |
1.2.0 |
|
sqlalchemy |
1.3.0 |
X |
tabulate |
0.8.7 |
X |
xarray |
0.12.0 |
|
xlrd |
1.2.0 |
|
xlsxwriter |
1.0.2 |
|
xlwt |
1.3.0 |
|
pandas-gbq |
0.12.0 |
See Dependencies and Optional dependencies for more.
Other API changes#
Partially initialized
CategoricalDtypeobjects (i.e. those withcategories=None) will no longer compare as equal to fully initialized dtype objects (GH 38516)Accessing
_constructor_expanddimon aDataFrameand_constructor_slicedon aSeriesnow raise anAttributeError. Previously aNotImplementedErrorwas raised (GH 38782)Added new
engineand**engine_kwargsparameters toDataFrame.to_sql()to support other future “SQL engines”. Currently we still only useSQLAlchemyunder the hood, but more engines are planned to be supported such as turbodbc (GH 36893)Removed redundant
freqfromPeriodIndexstring representation (GH 41653)ExtensionDtype.construct_array_type()is now a required method instead of an optional one forExtensionDtypesubclasses (GH 24860)Calling
hashon non-hashable pandas objects will now raiseTypeErrorwith the built-in error message (e.g.unhashable type: 'Series'). Previously it would raise a custom message such as'Series' objects are mutable, thus they cannot be hashed. Furthermore,isinstance(<Series>, abc.collections.Hashable)will now returnFalse(GH 40013)Styler.from_custom_template()now has two new arguments for template names, and removed the oldname, due to template inheritance having been introducing for better parsing (GH 42053). Subclassing modifications to Styler attributes are also needed.
Build#
Documentation in
.pptxand.pdfformats are no longer included in wheels or source distributions. (GH 30741)
Deprecations#
Deprecated dropping nuisance columns in DataFrame reductions and DataFrameGroupBy operations#
Calling a reduction (e.g. .min, .max, .sum) on a DataFrame with
numeric_only=None (the default), columns where the reduction raises a TypeError
are silently ignored and dropped from the result.
This behavior is deprecated. In a future version, the TypeError will be raised,
and users will need to select only valid columns before calling the function.
For example:
In [53]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [54]: df
Out[54]:
A B
0 1 2016-01-01
1 2 2016-01-02
2 3 2016-01-03
3 4 2016-01-04
Old behavior:
In [3]: df.prod()
Out[3]:
Out[3]:
A 24
dtype: int64
Future behavior:
In [4]: df.prod()
...
TypeError: 'DatetimeArray' does not implement reduction 'prod'
In [5]: df[["A"]].prod()
Out[5]:
A 24
dtype: int64
Similarly, when applying a function to DataFrameGroupBy, columns on which
the function raises TypeError are currently silently ignored and dropped
from the result.
This behavior is deprecated. In a future version, the TypeError
will be raised, and users will need to select only valid columns before calling
the function.
For example:
In [55]: df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
In [56]: gb = df.groupby([1, 1, 2, 2])
Old behavior:
In [4]: gb.prod(numeric_only=False)
Out[4]:
A
1 2
2 12
Future behavior:
In [5]: gb.prod(numeric_only=False)
...
TypeError: datetime64 type does not support prod operations
In [6]: gb[["A"]].prod(numeric_only=False)
Out[6]:
A
1 2
2 12
Other Deprecations#
Deprecated allowing scalars to be passed to the
Categoricalconstructor (GH 38433)Deprecated constructing
CategoricalIndexwithout passing list-like data (GH 38944)Deprecated allowing subclass-specific keyword arguments in the
Indexconstructor, use the specific subclass directly instead (GH 14093, GH 21311, GH 22315, GH 26974)Deprecated the
astype()method of datetimelike (timedelta64[ns],datetime64[ns],Datetime64TZDtype,PeriodDtype) to convert to integer dtypes, usevalues.view(...)instead (GH 38544). This deprecation was later reverted in pandas 1.4.0.Deprecated
MultiIndex.is_lexsorted()andMultiIndex.lexsort_depth(), useMultiIndex.is_monotonic_increasing()instead (GH 32259)Deprecated keyword
try_castinSeries.where(),Series.mask(),DataFrame.where(),DataFrame.mask(); cast results manually if desired (GH 38836)Deprecated comparison of
Timestampobjects withdatetime.dateobjects. Instead of e.g.ts <= mydateusets <= pd.Timestamp(mydate)orts.date() <= mydate(GH 36131)Deprecated
Rolling.win_typereturning"freq"(GH 38963)Deprecated
Rolling.is_datetimelike(GH 38963)Deprecated
DataFrameindexer forSeries.__setitem__()andDataFrame.__setitem__()(GH 39004)Deprecated
ExponentialMovingWindow.vol()(GH 39220)Using
.astypeto convert betweendatetime64[ns]dtype andDatetimeTZDtypeis deprecated and will raise in a future version, useobj.tz_localizeorobj.dt.tz_localizeinstead (GH 38622)Deprecated casting
datetime.dateobjects todatetime64when used asfill_valueinDataFrame.unstack(),DataFrame.shift(),Series.shift(), andDataFrame.reindex(), passpd.Timestamp(dateobj)instead (GH 39767)Deprecated
Styler.set_na_rep()andStyler.set_precision()in favor ofStyler.format()withna_repandprecisionas existing and new input arguments respectively (GH 40134, GH 40425)Deprecated
Styler.where()in favor of using an alternative formulation withStyler.applymap()(GH 40821)Deprecated allowing partial failure in
Series.transform()andDataFrame.transform()whenfuncis list-like or dict-like and raises anything butTypeError;funcraising anything but aTypeErrorwill raise in a future version (GH 40211)Deprecated arguments
error_bad_linesandwarn_bad_linesinread_csv()andread_table()in favor of argumenton_bad_lines(GH 15122)Deprecated support for
np.ma.mrecords.MaskedRecordsin theDataFrameconstructor, pass{name: data[name] for name in data.dtype.names}instead (GH 40363)Deprecated using
merge(),DataFrame.merge(), andDataFrame.join()on a different number of levels (GH 34862)Deprecated the use of
**kwargsinExcelWriter; use the keyword argumentengine_kwargsinstead (GH 40430)Deprecated the
levelkeyword forDataFrameandSeriesaggregations; use groupby instead (GH 39983)Deprecated the
inplaceparameter ofCategorical.remove_categories(),Categorical.add_categories(),Categorical.reorder_categories(),Categorical.rename_categories(),Categorical.set_categories()and will be removed in a future version (GH 37643)Deprecated
merge()producing duplicated columns through thesuffixeskeyword and already existing columns (GH 22818)Deprecated setting
Categorical._codes, create a newCategoricalwith the desired codes instead (GH 40606)Deprecated the
convert_floatoptional argument inread_excel()andExcelFile.parse()(GH 41127)Deprecated behavior of
DatetimeIndex.union()with mixed timezones; in a future version both will be cast to UTC instead of object dtype (GH 39328)Deprecated using
usecolswith out of bounds indices forread_csv()withengine="c"(GH 25623)Deprecated special treatment of lists with first element a Categorical in the
DataFrameconstructor; pass aspd.DataFrame({col: categorical, ...})instead (GH 38845)Deprecated behavior of
DataFrameconstructor when adtypeis passed and the data cannot be cast to that dtype. In a future version, this will raise instead of being silently ignored (GH 24435)Deprecated the
Timestamp.freqattribute. For the properties that use it (is_month_start,is_month_end,is_quarter_start,is_quarter_end,is_year_start,is_year_end), when you have afreq, use e.g.freq.is_month_start(ts)(GH 15146)Deprecated construction of
SeriesorDataFramewithDatetimeTZDtypedata anddatetime64[ns]dtype. UseSeries(data).dt.tz_localize(None)instead (GH 41555, GH 33401)Deprecated behavior of
Seriesconstruction with large-integer values and small-integer dtype silently overflowing; useSeries(data).astype(dtype)instead (GH 41734)Deprecated behavior of
DataFrameconstruction with floating data and integer dtype casting even when lossy; in a future version this will remain floating, matchingSeriesbehavior (GH 41770)Deprecated inference of
timedelta64[ns],datetime64[ns], orDatetimeTZDtypedtypes inSeriesconstruction when data containing strings is passed and nodtypeis passed (GH 33558)In a future version, constructing
SeriesorDataFramewithdatetime64[ns]data andDatetimeTZDtypewill treat the data as wall-times instead of as UTC times (matching DatetimeIndex behavior). To treat the data as UTC times, usepd.Series(data).dt.tz_localize("UTC").dt.tz_convert(dtype.tz)orpd.Series(data.view("int64"), dtype=dtype)(GH 33401)Deprecated passing lists as
keytoDataFrame.xs()andSeries.xs()(GH 41760)Deprecated boolean arguments of
inclusiveinSeries.between()to have{"left", "right", "neither", "both"}as standard argument values (GH 40628)Deprecated passing arguments as positional for all of the following, with exceptions noted (GH 41485):
concat()(other thanobjs)read_csv()(other thanfilepath_or_buffer)read_table()(other thanfilepath_or_buffer)DataFrame.clip()andSeries.clip()(other thanupperandlower)DataFrame.drop_duplicates()(except forsubset),Series.drop_duplicates(),Index.drop_duplicates()andMultiIndex.drop_duplicates()DataFrame.drop()(other thanlabels) andSeries.drop()DataFrame.ffill(),Series.ffill(),DataFrame.bfill(), andSeries.bfill()DataFrame.fillna()andSeries.fillna()(apart fromvalue)DataFrame.interpolate()andSeries.interpolate()(other thanmethod)DataFrame.mask()andSeries.mask()(other thancondandother)DataFrame.reset_index()(other thanlevel) andSeries.reset_index()DataFrame.set_axis()andSeries.set_axis()(other thanlabels)DataFrame.set_index()(other thankeys)DataFrame.sort_values()(other thanby) andSeries.sort_values()DataFrame.where()andSeries.where()(other thancondandother)Index.set_names()andMultiIndex.set_names()(except fornames)MultiIndex.codes()(except forcodes)MultiIndex.set_levels()(except forlevels)Resampler.interpolate()(other thanmethod)
Performance improvements#
Performance improvement in
IntervalIndex.isin()(GH 38353)Performance improvement in
Series.mean()for nullable data types (GH 34814)Performance improvement in
Series.isin()for nullable data types (GH 38340)Performance improvement in
DataFrame.fillna()withmethod="pad"ormethod="backfill"for nullable floating and nullable integer dtypes (GH 39953)Performance improvement in
DataFrame.corr()formethod=kendall(GH 28329)Performance improvement in
DataFrame.corr()formethod=spearman(GH 40956, GH 41885)Performance improvement in
Rolling.corr()andRolling.cov()(GH 39388)Performance improvement in
RollingGroupby.corr(),ExpandingGroupby.corr(),ExpandingGroupby.corr()andExpandingGroupby.cov()(GH 39591)Performance improvement in
unique()for object data type (GH 37615)Performance improvement in
json_normalize()for basic cases (including separators) (GH 40035 GH 15621)Performance improvement in
ExpandingGroupbyaggregation methods (GH 39664)Performance improvement in
Stylerwhere render times are more than 50% reduced and now matchesDataFrame.to_html()(GH 39972 GH 39952, GH 40425)The method
Styler.set_td_classes()is now as performant asStyler.apply()andStyler.applymap(), and even more so in some cases (GH 40453)Performance improvement in
ExponentialMovingWindow.mean()withtimes(GH 39784)Performance improvement in
DataFrameGroupBy.apply()andSeriesGroupBy.apply()when requiring the Python fallback implementation (GH 40176)Performance improvement in the conversion of a PyArrow Boolean array to a pandas nullable Boolean array (GH 41051)
Performance improvement for concatenation of data with type
CategoricalDtype(GH 40193)Performance improvement in
DataFrameGroupBy.cummin(),SeriesGroupBy.cummin(),DataFrameGroupBy.cummax(), andSeriesGroupBy.cummax()with nullable data types (GH 37493)Performance improvement in
Series.nunique()with nan values (GH 40865)Performance improvement in
DataFrame.transpose(),Series.unstack()withDatetimeTZDtype(GH 40149)Performance improvement in
Series.plot()andDataFrame.plot()with entry point lazy loading (GH 41492)
Bug fixes#
Categorical#
Bug in
CategoricalIndexincorrectly failing to raiseTypeErrorwhen scalar data is passed (GH 38614)Bug in
CategoricalIndex.reindexfailed when theIndexpassed was not categorical but whose values were all labels in the category (GH 28690)Bug where constructing a
Categoricalfrom an object-dtype array ofdateobjects did not round-trip correctly withastype(GH 38552)Bug in constructing a
DataFramefrom anndarrayand aCategoricalDtype(GH 38857)Bug in setting categorical values into an object-dtype column in a
DataFrame(GH 39136)Bug in
DataFrame.reindex()was raising anIndexErrorwhen the new index contained duplicates and the old index was aCategoricalIndex(GH 38906)Bug in
Categorical.fillna()with a tuple-like category raisingNotImplementedErrorinstead ofValueErrorwhen filling with a non-category tuple (GH 41914)
Datetimelike#
Bug in
DataFrameandSeriesconstructors sometimes dropping nanoseconds fromTimestamp(resp.Timedelta)data, withdtype=datetime64[ns](resp.timedelta64[ns]) (GH 38032)Bug in
DataFrame.first()andSeries.first()with an offset of one month returning an incorrect result when the first day is the last day of a month (GH 29623)Bug in constructing a
DataFrameorSerieswith mismatcheddatetime64data andtimedelta64dtype, or vice-versa, failing to raise aTypeError(GH 38575, GH 38764, GH 38792)Bug in constructing a
SeriesorDataFramewith adatetimeobject out of bounds fordatetime64[ns]dtype or atimedeltaobject out of bounds fortimedelta64[ns]dtype (GH 38792, GH 38965)Bug in
DatetimeIndex.intersection(),DatetimeIndex.symmetric_difference(),PeriodIndex.intersection(),PeriodIndex.symmetric_difference()always returning object-dtype when operating withCategoricalIndex(GH 38741)Bug in
DatetimeIndex.intersection()giving incorrect results with non-Tick frequencies withn != 1(GH 42104)Bug in
Series.where()incorrectly castingdatetime64values toint64(GH 37682)Bug in
Categoricalincorrectly typecastingdatetimeobject toTimestamp(GH 38878)Bug in comparisons between
Timestampobject anddatetime64objects just outside the implementation bounds for nanoseconddatetime64(GH 39221)Bug in
Timestamp.round(),Timestamp.floor(),Timestamp.ceil()for values near the implementation bounds ofTimestamp(GH 39244)Bug in
Timedelta.round(),Timedelta.floor(),Timedelta.ceil()for values near the implementation bounds ofTimedelta(GH 38964)Bug in
date_range()incorrectly creatingDatetimeIndexcontainingNaTinstead of raisingOutOfBoundsDatetimein corner cases (GH 24124)Bug in
infer_freq()incorrectly fails to infer ‘H’ frequency ofDatetimeIndexif the latter has a timezone and crosses DST boundaries (GH 39556)Bug in
Seriesbacked byDatetimeArrayorTimedeltaArraysometimes failing to set the array’sfreqtoNone(GH 41425)
Timedelta#
Bug in constructing
Timedeltafromnp.timedelta64objects with non-nanosecond units that are out of bounds fortimedelta64[ns](GH 38965)Bug in constructing a
TimedeltaIndexincorrectly acceptingnp.datetime64("NaT")objects (GH 39462)Bug in constructing
Timedeltafrom an input string with only symbols and no digits failed to raise an error (GH 39710)Bug in
TimedeltaIndexandto_timedelta()failing to raise when passed non-nanosecondtimedelta64arrays that overflow when converting totimedelta64[ns](GH 40008)
Timezones#
Numeric#
Bug in
DataFrame.quantile(),DataFrame.sort_values()causing incorrect subsequent indexing behavior (GH 38351)Bug in
DataFrame.sort_values()raising anIndexErrorfor emptyby(GH 40258)Bug in
DataFrame.select_dtypes()withinclude=np.numberwould drop numericExtensionDtypecolumns (GH 35340)Bug in
DataFrame.mode()andSeries.mode()not keeping consistent integerIndexfor empty input (GH 33321)Bug in
DataFrame.rank()when the DataFrame containednp.inf(GH 32593)Bug in
DataFrame.rank()withaxis=0and columns holding incomparable types raising anIndexError(GH 38932)Bug in
Series.rank(),DataFrame.rank(),DataFrameGroupBy.rank(), andSeriesGroupBy.rank()treating the most negativeint64value as missing (GH 32859)Bug in
DataFrame.select_dtypes()different behavior between Windows and Linux withinclude="int"(GH 36596)Bug in
DataFrame.apply()andDataFrame.agg()when passed the argumentfunc="size"would operate on the entireDataFrameinstead of rows or columns (GH 39934)Bug in
DataFrame.transform()would raise aSpecificationErrorwhen passed a dictionary and columns were missing; will now raise aKeyErrorinstead (GH 40004)Bug in
DataFrameGroupBy.rank()andSeriesGroupBy.rank()giving incorrect results withpct=Trueand equal values between consecutive groups (GH 40518)Bug in
Series.count()would result in anint32result on 32-bit platforms when argumentlevel=None(GH 40908)Bug in
SeriesandDataFramereductions with methodsanyandallnot returning Boolean results for object data (GH 12863, GH 35450, GH 27709)Bug in
Series.clip()would fail if the Series contains NA values and has nullable int or float as a data type (GH 40851)Bug in
UInt64Index.where()andUInt64Index.putmask()with annp.int64dtypeotherincorrectly raisingTypeError(GH 41974)Bug in
DataFrame.agg()not sorting the aggregated axis in the order of the provided aggregation functions when one or more aggregation function fails to produce results (GH 33634)Bug in
DataFrame.clip()not interpreting missing values as no threshold (GH 40420)
Conversion#
Bug in
Series.to_dict()withorient='records'now returns Python native types (GH 25969)Bug in
Series.view()andIndex.view()when converting between datetime-like (datetime64[ns],datetime64[ns, tz],timedelta64,period) dtypes (GH 39788)Bug in creating a
DataFramefrom an emptynp.recarraynot retaining the original dtypes (GH 40121)Bug in
DataFramefailing to raise aTypeErrorwhen constructing from afrozenset(GH 40163)Bug in
Indexconstruction silently ignoring a passeddtypewhen the data cannot be cast to that dtype (GH 21311)Bug in
StringArray.astype()falling back to NumPy and raising when converting todtype='categorical'(GH 40450)Bug in
factorize()where, when given an array with a numeric NumPy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (GH 41132)Bug in
DataFrameconstruction with a dictionary containing an array-like withExtensionDtypeandcopy=Truefailing to make a copy (GH 38939)Bug in
qcut()raising error when takingFloat64DTypeas input (GH 40730)Bug in
DataFrameandSeriesconstruction withdatetime64[ns]data anddtype=objectresulting indatetimeobjects instead ofTimestampobjects (GH 41599)Bug in
DataFrameandSeriesconstruction withtimedelta64[ns]data anddtype=objectresulting innp.timedelta64objects instead ofTimedeltaobjects (GH 41599)Bug in
DataFrameconstruction when given a two-dimensional object-dtypenp.ndarrayofPeriodorIntervalobjects failing to cast toPeriodDtypeorIntervalDtype, respectively (GH 41812)Bug in constructing a
Seriesfrom a list and aPandasDtype(GH 39357)Bug in creating a
Seriesfrom arangeobject that does not fit in the bounds ofint64dtype (GH 30173)Bug in creating a
Seriesfrom adictwith all-tuple keys and anIndexthat requires reindexing (GH 41707)Bug in
infer_dtype()not recognizing Series, Index, or array with a Period dtype (GH 23553)Bug in
infer_dtype()raising an error for generalExtensionArrayobjects. It will now return"unknown-array"instead of raising (GH 37367)Bug in
DataFrame.convert_dtypes()incorrectly raised aValueErrorwhen called on an empty DataFrame (GH 40393)
Strings#
Bug in the conversion from
pyarrow.ChunkedArraytoStringArraywhen the original had zero chunks (GH 41040)Bug in
Series.replace()andDataFrame.replace()ignoring replacements withregex=TrueforStringDTypedata (GH 41333, GH 35977)Bug in
Series.str.extract()withStringArrayreturning object dtype for an emptyDataFrame(GH 41441)Bug in
Series.str.replace()where thecaseargument was ignored whenregex=False(GH 41602)
Interval#
Bug in
IntervalIndex.intersection()andIntervalIndex.symmetric_difference()always returning object-dtype when operating withCategoricalIndex(GH 38653, GH 38741)Bug in
IntervalIndex.intersection()returning duplicates when at least one of theIndexobjects have duplicates which are present in the other (GH 38743)IntervalIndex.union(),IntervalIndex.intersection(),IntervalIndex.difference(), andIntervalIndex.symmetric_difference()now cast to the appropriate dtype instead of raising aTypeErrorwhen operating with anotherIntervalIndexwith incompatible dtype (GH 39267)PeriodIndex.union(),PeriodIndex.intersection(),PeriodIndex.symmetric_difference(),PeriodIndex.difference()now cast to object dtype instead of raisingIncompatibleFrequencywhen operating with anotherPeriodIndexwith incompatible dtype (GH 39306)Bug in
IntervalIndex.is_monotonic(),IntervalIndex.get_loc(),IntervalIndex.get_indexer_for(), andIntervalIndex.__contains__()when NA values are present (GH 41831)
Indexing#
Bug in
Index.union()andMultiIndex.union()dropping duplicateIndexvalues whenIndexwas not monotonic orsortwas set toFalse(GH 36289, GH 31326, GH 40862)Bug in
CategoricalIndex.get_indexer()failing to raiseInvalidIndexErrorwhen non-unique (GH 38372)Bug in
IntervalIndex.get_indexer()whentargethasCategoricalDtypeand both the index and the target contain NA values (GH 41934)Bug in
Series.loc()raising aValueErrorwhen input was filtered with a Boolean list and values to set were a list with lower dimension (GH 20438)Bug in inserting many new columns into a
DataFramecausing incorrect subsequent indexing behavior (GH 38380)Bug in
DataFrame.__setitem__()raising aValueErrorwhen setting multiple values to duplicate columns (GH 15695)Bug in
DataFrame.loc(),Series.loc(),DataFrame.__getitem__()andSeries.__getitem__()returning incorrect elements for non-monotonicDatetimeIndexfor string slices (GH 33146)Bug in
DataFrame.reindex()andSeries.reindex()with timezone aware indexes raising aTypeErrorformethod="ffill"andmethod="bfill"and specifiedtolerance(GH 38566)Bug in
DataFrame.reindex()withdatetime64[ns]ortimedelta64[ns]incorrectly casting to integers when thefill_valuerequires casting to object dtype (GH 39755)Bug in
DataFrame.__setitem__()raising aValueErrorwhen setting on an emptyDataFrameusing specified columns and a nonemptyDataFramevalue (GH 38831)Bug in
DataFrame.loc.__setitem__()raising aValueErrorwhen operating on a unique column when theDataFramehas duplicate columns (GH 38521)Bug in
DataFrame.iloc.__setitem__()andDataFrame.loc.__setitem__()with mixed dtypes when setting with a dictionary value (GH 38335)Bug in
Series.loc.__setitem__()andDataFrame.loc.__setitem__()raisingKeyErrorwhen provided a Boolean generator (GH 39614)Bug in
Series.iloc()andDataFrame.iloc()raising aKeyErrorwhen provided a generator (GH 39614)Bug in
DataFrame.__setitem__()not raising aValueErrorwhen the right hand side is aDataFramewith wrong number of columns (GH 38604)Bug in
Series.__setitem__()raising aValueErrorwhen setting aSerieswith a scalar indexer (GH 38303)Bug in
DataFrame.loc()dropping levels of aMultiIndexwhen theDataFrameused as input has only one row (GH 10521)Bug in
DataFrame.__getitem__()andSeries.__getitem__()always raisingKeyErrorwhen slicing with existing strings where theIndexhas milliseconds (GH 33589)Bug in setting
timedelta64ordatetime64values into numericSeriesfailing to cast to object dtype (GH 39086, GH 39619)Bug in setting
Intervalvalues into aSeriesorDataFramewith mismatchedIntervalDtypeincorrectly casting the new values to the existing dtype (GH 39120)Bug in setting
datetime64values into aSerieswith integer-dtype incorrectly casting the datetime64 values to integers (GH 39266)Bug in setting
np.datetime64("NaT")into aSerieswithDatetime64TZDtypeincorrectly treating the timezone-naive value as timezone-aware (GH 39769)Bug in
Index.get_loc()not raisingKeyErrorwhenkey=NaNandmethodis specified butNaNis not in theIndex(GH 39382)Bug in
DatetimeIndex.insert()when insertingnp.datetime64("NaT")into a timezone-aware index incorrectly treating the timezone-naive value as timezone-aware (GH 39769)Bug in incorrectly raising in
Index.insert(), when setting a new column that cannot be held in the existingframe.columns, or inSeries.reset_index()orDataFrame.reset_index()instead of casting to a compatible dtype (GH 39068)Bug in
RangeIndex.append()where a single object of length 1 was concatenated incorrectly (GH 39401)Bug in
RangeIndex.astype()where when converting toCategoricalIndex, the categories became aInt64Indexinstead of aRangeIndex(GH 41263)Bug in setting
numpy.timedelta64values into an object-dtypeSeriesusing a Boolean indexer (GH 39488)Bug in setting numeric values into a into a boolean-dtypes
Seriesusingatoriatfailing to cast to object-dtype (GH 39582)Bug in
DataFrame.__setitem__()andDataFrame.iloc.__setitem__()raisingValueErrorwhen trying to index with a row-slice and setting a list as values (GH 40440)Bug in
DataFrame.loc()not raisingKeyErrorwhen the key was not found inMultiIndexand the levels were not fully specified (GH 41170)Bug in
DataFrame.loc.__setitem__()when setting-with-expansion incorrectly raising when the index in the expanding axis contained duplicates (GH 40096)Bug in
DataFrame.loc.__getitem__()withMultiIndexcasting to float when at least one index column has float dtype and we retrieve a scalar (GH 41369)Bug in
DataFrame.loc()incorrectly matching non-Boolean index elements (GH 20432)Bug in indexing with
np.nanon aSeriesorDataFramewith aCategoricalIndexincorrectly raisingKeyErrorwhennp.nankeys are present (GH 41933)Bug in
Series.__delitem__()withExtensionDtypeincorrectly casting tondarray(GH 40386)Bug in
DataFrame.at()with aCategoricalIndexreturning incorrect results when passed integer keys (GH 41846)Bug in
DataFrame.loc()returning aMultiIndexin the wrong order if an indexer has duplicates (GH 40978)Bug in
DataFrame.__setitem__()raising aTypeErrorwhen using astrsubclass as the column name with aDatetimeIndex(GH 37366)Bug in
PeriodIndex.get_loc()failing to raise aKeyErrorwhen given aPeriodwith a mismatchedfreq(GH 41670)Bug
.loc.__getitem__with aUInt64Indexand negative-integer keys raisingOverflowErrorinstead ofKeyErrorin some cases, wrapping around to positive integers in others (GH 41777)Bug in
Index.get_indexer()failing to raiseValueErrorin some cases with invalidmethod,limit, ortolerancearguments (GH 41918)Bug when slicing a
SeriesorDataFramewith aTimedeltaIndexwhen passing an invalid string raisingValueErrorinstead of aTypeError(GH 41821)Bug in
Indexconstructor sometimes silently ignoring a specifieddtype(GH 38879)Index.where()behavior now mirrorsIndex.putmask()behavior, i.e.index.where(mask, other)matchesindex.putmask(~mask, other)(GH 39412)
Missing#
Bug in
Grouperdid not correctly propagate thedropnaargument;DataFrameGroupBy.transform()now correctly handles missing values fordropna=True(GH 35612)Bug in
isna(),Series.isna(),Index.isna(),DataFrame.isna(), and the correspondingnotnafunctions not recognizingDecimal("NaN")objects (GH 39409)Bug in
DataFrame.fillna()not accepting a dictionary for thedowncastkeyword (GH 40809)Bug in
isna()not returning a copy of the mask for nullable types, causing any subsequent mask modification to change the original array (GH 40935)Bug in
DataFrameconstruction with float data containingNaNand an integerdtypecasting instead of retaining theNaN(GH 26919)Bug in
Series.isin()andMultiIndex.isin()didn’t treat all nans as equivalent if they were in tuples (GH 41836)
MultiIndex#
Bug in
DataFrame.drop()raising aTypeErrorwhen theMultiIndexis non-unique andlevelis not provided (GH 36293)Bug in
MultiIndex.intersection()duplicatingNaNin the result (GH 38623)Bug in
MultiIndex.equals()incorrectly returningTruewhen theMultiIndexcontainedNaNeven when they are differently ordered (GH 38439)Bug in
MultiIndex.intersection()always returning an empty result when intersecting withCategoricalIndex(GH 38653)Bug in
MultiIndex.difference()incorrectly raisingTypeErrorwhen indexes contain non-sortable entries (GH 41915)Bug in
MultiIndex.reindex()raising aValueErrorwhen used on an emptyMultiIndexand indexing only a specific level (GH 41170)Bug in
MultiIndex.reindex()raisingTypeErrorwhen reindexing against a flatIndex(GH 41707)
I/O#
Bug in
Index.__repr__()whendisplay.max_seq_items=1(GH 38415)Bug in
read_csv()not recognizing scientific notation if the argumentdecimalis set andengine="python"(GH 31920)Bug in
read_csv()interpretingNAvalue as comment, whenNAdoes contain the comment string fixed forengine="python"(GH 34002)Bug in
read_csv()raising anIndexErrorwith multiple header columns andindex_colis specified when the file has no data rows (GH 38292)Bug in
read_csv()not acceptingusecolswith a different length thannamesforengine="python"(GH 16469)Bug in
read_csv()returning object dtype whendelimiter=","withusecolsandparse_datesspecified forengine="python"(GH 35873)Bug in
read_csv()raising aTypeErrorwhennamesandparse_datesis specified forengine="c"(GH 33699)Bug in
read_clipboard()andDataFrame.to_clipboard()not working in WSL (GH 38527)Allow custom error values for the
parse_datesargument ofread_sql(),read_sql_query()andread_sql_table()(GH 35185)Bug in
DataFrame.to_hdf()andSeries.to_hdf()raising aKeyErrorwhen trying to apply for subclasses ofDataFrameorSeries(GH 33748)Bug in
HDFStore.put()raising a wrongTypeErrorwhen saving a DataFrame with non-string dtype (GH 34274)Bug in
json_normalize()resulting in the first element of a generator object not being included in the returned DataFrame (GH 35923)Bug in
read_csv()applying the thousands separator to date columns when the column should be parsed for dates andusecolsis specified forengine="python"(GH 39365)Bug in
read_excel()forward fillingMultiIndexnames when multiple header and index columns are specified (GH 34673)Bug in
read_excel()not respectingset_option()(GH 34252)Bug in
read_csv()not switchingtrue_valuesandfalse_valuesfor nullable Boolean dtype (GH 34655)Bug in
read_json()whenorient="split"not maintaining a numeric string index (GH 28556)read_sql()returned an empty generator ifchunksizewas non-zero and the query returned no results. Now returns a generator with a single empty DataFrame (GH 34411)Bug in
read_hdf()returning unexpected records when filtering on categorical string columns using thewhereparameter (GH 39189)Bug in
read_sas()raising aValueErrorwhendatetimeswere null (GH 39725)Bug in
read_excel()dropping empty values from single-column spreadsheets (GH 39808)Bug in
read_excel()loading trailing empty rows/columns for some filetypes (GH 41167)Bug in
read_excel()raising anAttributeErrorwhen the excel file had aMultiIndexheader followed by two empty rows and no index (GH 40442)Bug in
read_excel(),read_csv(),read_table(),read_fwf(), andread_clipboard()where one blank row after aMultiIndexheader with no index would be dropped (GH 40442)Bug in
DataFrame.to_string()misplacing the truncation column whenindex=False(GH 40904)Bug in
DataFrame.to_string()adding an extra dot and misaligning the truncation row whenindex=False(GH 40904)Bug in
read_orc()always raising anAttributeError(GH 40918)Bug in
read_csv()andread_table()silently ignoringprefixifnamesandprefixare defined, now raising aValueError(GH 39123)Bug in
read_csv()andread_excel()not respecting the dtype for a duplicated column name whenmangle_dupe_colsis set toTrue(GH 35211)Bug in
read_csv()silently ignoringsepifdelimiterandsepare defined, now raising aValueError(GH 39823)Bug in
read_csv()andread_table()misinterpreting arguments whensys.setprofilehad been previously called (GH 41069)Bug in the conversion from PyArrow to pandas (e.g. for reading Parquet) with nullable dtypes and a PyArrow array whose data buffer size is not a multiple of the dtype size (GH 40896)
Bug in
read_excel()would raise an error when pandas could not determine the file type even though the user specified theengineargument (GH 41225)Bug in
read_clipboard()copying from an excel file shifts values into the wrong column if there are null values in first column (GH 41108)Bug in
DataFrame.to_hdf()andSeries.to_hdf()raising aTypeErrorwhen trying to append a string column to an incompatible column (GH 41897)
Period#
Plotting#
Bug in
plotting.scatter_matrix()raising when 2daxargument passed (GH 16253)Prevent warnings when Matplotlib’s
constrained_layoutis enabled (GH 25261)Bug in
DataFrame.plot()was showing the wrong colors in the legend if the function was called repeatedly and some calls usedyerrwhile others didn’t (GH 39522)Bug in
DataFrame.plot()was showing the wrong colors in the legend if the function was called repeatedly and some calls usedsecondary_yand others uselegend=False(GH 40044)Bug in
DataFrame.plot.box()whendark_backgroundtheme was selected, caps or min/max markers for the plot were not visible (GH 40769)
Groupby/resample/rolling#
Bug in
DataFrameGroupBy.agg()andSeriesGroupBy.agg()withPeriodDtypecolumns incorrectly casting results too aggressively (GH 38254)Bug in
SeriesGroupBy.value_counts()where unobserved categories in a grouped categorical Series were not tallied (GH 38672)Bug in
SeriesGroupBy.value_counts()where an error was raised on an empty Series (GH 39172)Bug in
GroupBy.indices()would contain non-existent indices when null values were present in the groupby keys (GH 9304)Fixed bug in
DataFrameGroupBy.sum()andSeriesGroupBy.sum()causing a loss of precision by now using Kahan summation (GH 38778)Fixed bug in
DataFrameGroupBy.cumsum(),SeriesGroupBy.cumsum(),DataFrameGroupBy.mean(), andSeriesGroupBy.mean()causing loss of precision through using Kahan summation (GH 38934)Bug in
Resampler.aggregate()andDataFrame.transform()raising aTypeErrorinstead ofSpecificationErrorwhen missing keys had mixed dtypes (GH 39025)Bug in
DataFrameGroupBy.idxmin()andDataFrameGroupBy.idxmax()withExtensionDtypecolumns (GH 38733)Bug in
Series.resample()would raise when the index was aPeriodIndexconsisting ofNaT(GH 39227)Bug in
RollingGroupby.corr()andExpandingGroupby.corr()where the groupby column would return0instead ofnp.nanwhen providingotherthat was longer than each group (GH 39591)Bug in
ExpandingGroupby.corr()andExpandingGroupby.cov()where1would be returned instead ofnp.nanwhen providingotherthat was longer than each group (GH 39591)Bug in
DataFrameGroupBy.mean(),SeriesGroupBy.mean(),DataFrameGroupBy.median(),SeriesGroupBy.median(), andDataFrame.pivot_table()not propagating metadata (GH 28283)Bug in
Series.rolling()andDataFrame.rolling()not calculating window bounds correctly when window is an offset and dates are in descending order (GH 40002)Bug in
Series.groupby()andDataFrame.groupby()on an emptySeriesorDataFramewould lose index, columns, and/or data types when directly using the methodsidxmax,idxmin,mad,min,max,sum,prod, andskewor using them throughapply,aggregate, orresample(GH 26411)Bug in
DataFrameGroupBy.apply()andSeriesGroupBy.apply()where aMultiIndexwould be created instead of anIndexwhen used on aRollingGroupbyobject (GH 39732)Bug in
DataFrameGroupBy.sample()where an error was raised whenweightswas specified and the index was anInt64Index(GH 39927)Bug in
DataFrameGroupBy.aggregate()andResampler.aggregate()would sometimes raise aSpecificationErrorwhen passed a dictionary and columns were missing; will now always raise aKeyErrorinstead (GH 40004)Bug in
DataFrameGroupBy.sample()where column selection was not applied before computing the result (GH 39928)Bug in
ExponentialMovingWindowwhen calling__getitem__would incorrectly raise aValueErrorwhen providingtimes(GH 40164)Bug in
ExponentialMovingWindowwhen calling__getitem__would not retaincom,span,alphaorhalflifeattributes (GH 40164)ExponentialMovingWindownow raises aNotImplementedErrorwhen specifyingtimeswithadjust=Falsedue to an incorrect calculation (GH 40098)Bug in
ExponentialMovingWindowGroupby.mean()where thetimesargument was ignored whenengine='numba'(GH 40951)Bug in
ExponentialMovingWindowGroupby.mean()where the wrong times were used the in case of multiple groups (GH 40951)Bug in
ExponentialMovingWindowGroupbywhere the times vector and values became out of sync for non-trivial groups (GH 40951)Bug in
Series.asfreq()andDataFrame.asfreq()dropping rows when the index was not sorted (GH 39805)Bug in aggregation functions for
DataFramenot respectingnumeric_onlyargument whenlevelkeyword was given (GH 40660)Bug in
SeriesGroupBy.aggregate()where using a user-defined function to aggregate a Series with an object-typedIndexcauses an incorrectIndexshape (GH 40014)Bug in
RollingGroupbywhereas_index=Falseargument ingroupbywas ignored (GH 39433)Bug in
DataFrameGroupBy.any(),SeriesGroupBy.any(),DataFrameGroupBy.all()andSeriesGroupBy.all()raising aValueErrorwhen using with nullable type columns holdingNAeven withskipna=True(GH 40585)Bug in
DataFrameGroupBy.cummin(),SeriesGroupBy.cummin(),DataFrameGroupBy.cummax()andSeriesGroupBy.cummax()incorrectly rounding integer values near theint64implementations bounds (GH 40767)Bug in
DataFrameGroupBy.rank()andSeriesGroupBy.rank()with nullable dtypes incorrectly raising aTypeError(GH 41010)Bug in
DataFrameGroupBy.cummin(),SeriesGroupBy.cummin(),DataFrameGroupBy.cummax()andSeriesGroupBy.cummax()computing wrong result with nullable data types too large to roundtrip when casting to float (GH 37493)Bug in
DataFrame.rolling()returning mean zero for allNaNwindow withmin_periods=0if calculation is not numerical stable (GH 41053)Bug in
DataFrame.rolling()returning sum not zero for allNaNwindow withmin_periods=0if calculation is not numerical stable (GH 41053)Bug in
SeriesGroupBy.agg()failing to retain orderedCategoricalDtypeon order-preserving aggregations (GH 41147)Bug in
DataFrameGroupBy.min(),SeriesGroupBy.min(),DataFrameGroupBy.max()andSeriesGroupBy.max()with multiple object-dtype columns andnumeric_only=Falseincorrectly raising aValueError(GH 41111)Bug in
DataFrameGroupBy.rank()with the GroupBy object’saxis=0and therankmethod’s keywordaxis=1(GH 41320)Bug in
DataFrameGroupBy.__getitem__()with non-unique columns incorrectly returning a malformedSeriesGroupByinstead ofDataFrameGroupBy(GH 41427)Bug in
DataFrameGroupBy.transform()with non-unique columns incorrectly raising anAttributeError(GH 41427)Bug in
Resampler.apply()with non-unique columns incorrectly dropping duplicated columns (GH 41445)Bug in
Series.groupby()aggregations incorrectly returning emptySeriesinstead of raisingTypeErroron aggregations that are invalid for its dtype, e.g..prodwithdatetime64[ns]dtype (GH 41342)Bug in
DataFrameGroupByaggregations incorrectly failing to drop columns with invalid dtypes for that aggregation when there are no valid columns (GH 41291)Bug in
DataFrame.rolling.__iter__()whereonwas not assigned to the index of the resulting objects (GH 40373)Bug in
DataFrameGroupBy.transform()andDataFrameGroupBy.agg()withengine="numba"where*argswere being cached with the user passed function (GH 41647)Bug in
DataFrameGroupBymethodsagg,transform,sum,bfill,ffill,pad,pct_change,shift,ohlcdropping.columns.names(GH 41497)
Reshaping#
Bug in
merge()raising error when performing an inner join with partial index andright_index=Truewhen there was no overlap between indices (GH 33814)Bug in
DataFrame.unstack()with missing levels led to incorrect index names (GH 37510)Bug in
merge_asof()propagating the right Index withleft_index=Trueandright_onspecification instead of left Index (GH 33463)Bug in
DataFrame.join()on a DataFrame with aMultiIndexreturned the wrong result when one of both indexes had only one level (GH 36909)merge_asof()now raises aValueErrorinstead of a crypticTypeErrorin case of non-numerical merge columns (GH 29130)Bug in
DataFrame.join()not assigning values correctly when the DataFrame had aMultiIndexwhere at least one dimension had dtypeCategoricalwith non-alphabetically sorted categories (GH 38502)Series.value_counts()andSeries.mode()now return consistent keys in original order (GH 12679, GH 11227 and GH 39007)Bug in
DataFrame.stack()not handlingNaNinMultiIndexcolumns correctly (GH 39481)Bug in
DataFrame.apply()would give incorrect results when the argumentfuncwas a string,axis=1, and the axis argument was not supported; now raises aValueErrorinstead (GH 39211)Bug in
DataFrame.sort_values()not reshaping the index correctly after sorting on columns whenignore_index=True(GH 39464)Bug in
DataFrame.append()returning incorrect dtypes with combinations ofExtensionDtypedtypes (GH 39454)Bug in
DataFrame.append()returning incorrect dtypes when used with combinations ofdatetime64andtimedelta64dtypes (GH 39574)Bug in
DataFrame.append()with aDataFramewith aMultiIndexand appending aSerieswhoseIndexis not aMultiIndex(GH 41707)Bug in
DataFrame.pivot_table()returning aMultiIndexfor a single value when operating on an empty DataFrame (GH 13483)Indexcan now be passed to thenumpy.all()function (GH 40180)Bug in
DataFrame.stack()not preservingCategoricalDtypein aMultiIndex(GH 36991)Bug in
to_datetime()raising an error when the input sequence contained unhashable items (GH 39756)Bug in
Series.explode()preserving the index whenignore_indexwasTrueand values were scalars (GH 40487)Bug in
to_datetime()raising aValueErrorwhenSeriescontainsNoneandNaTand has more than 50 elements (GH 39882)Bug in
Series.unstack()andDataFrame.unstack()with object-dtype values containing timezone-aware datetime objects incorrectly raisingTypeError(GH 41875)Bug in
DataFrame.melt()raisingInvalidIndexErrorwhenDataFramehas duplicate columns used asvalue_vars(GH 41951)
Sparse#
Bug in
DataFrame.sparse.to_coo()raising aKeyErrorwith columns that are a numericIndexwithout a0(GH 18414)Bug in
SparseArray.astype()withcopy=Falseproducing incorrect results when going from integer dtype to floating dtype (GH 34456)Bug in
SparseArray.max()andSparseArray.min()would always return an empty result (GH 40921)
ExtensionArray#
Bug in
DataFrame.where()whenotheris a Series with anExtensionDtype(GH 38729)Fixed bug where
Series.idxmax(),Series.idxmin(),Series.argmax(), andSeries.argmin()would fail when the underlying data is anExtensionArray(GH 32749, GH 33719, GH 36566)Fixed bug where some properties of subclasses of
PandasExtensionDtypewhere improperly cached (GH 40329)Bug in
DataFrame.mask()where masking a DataFrame with anExtensionDtyperaises aValueError(GH 40941)
Styler#
Bug in
Stylerwhere thesubsetargument in methods raised an error for some valid MultiIndex slices (GH 33562)Stylerrendered HTML output has seen minor alterations to support w3 good code standards (GH 39626)Bug in
Stylerwhere rendered HTML was missing a column class identifier for certain header cells (GH 39716)Bug in
Styler.background_gradient()where text-color was not determined correctly (GH 39888)Bug in
Styler.set_table_styles()where multiple elements in CSS-selectors of thetable_stylesargument were not correctly added (GH 34061)Bug in
Stylerwhere copying from Jupyter dropped the top left cell and misaligned headers (GH 12147)Bug in
Styler.wherewherekwargswere not passed to the applicable callable (GH 40845)Bug in
Stylercausing CSS to duplicate on multiple renders (GH 39395, GH 40334)
Other#
inspect.getmembers(Series)no longer raises anAbstractMethodError(GH 38782)Bug in
Series.where()with numeric dtype andother=Nonenot casting tonan(GH 39761)Bug in
assert_series_equal(),assert_frame_equal(),assert_index_equal()andassert_extension_array_equal()incorrectly raising when an attribute has an unrecognized NA type (GH 39461)Bug in
assert_index_equal()withexact=Truenot raising when comparingCategoricalIndexinstances withInt64IndexandRangeIndexcategories (GH 41263)Bug in
DataFrame.equals(),Series.equals(), andIndex.equals()with object-dtype containingnp.datetime64("NaT")ornp.timedelta64("NaT")(GH 39650)Bug in
show_versions()where console JSON output was not proper JSON (GH 39701)Bug in
pandas.util.hash_pandas_object()not recognizinghash_key,encodingandcategorizewhen the input object type is aDataFrame(GH 41404)
Contributors#
A total of 251 people contributed patches to this release. People with a “+” by their names contributed a patch for the first time.
Abhishek R +
Ada Draginda
Adam J. Stewart
Adam Turner +
Aidan Feldman +
Ajitesh Singh +
Akshat Jain +
Albert Villanova del Moral
Alexandre Prince-Levasseur +
Andrew Hawyrluk +
Andrew Wieteska
AnglinaBhambra +
Ankush Dua +
Anna Daglis
Ashlan Parker +
Ashwani +
Avinash Pancham
Ayushman Kumar +
BeanNan
Benoît Vinot
Bharat Raghunathan
Bijay Regmi +
Bobin Mathew +
Bogdan Pilyavets +
Brian Hulette +
Brian Sun +
Brock +
Bryan Cutler
Caleb +
Calvin Ho +
Chathura Widanage +
Chinmay Rane +
Chris Lynch
Chris Withers
Christos Petropoulos
Corentin Girard +
DaPy15 +
Damodara Puddu +
Daniel Hrisca
Daniel Saxton
DanielFEvans
Dare Adewumi +
Dave Willmer
David Schlachter +
David-dmh +
Deepang Raval +
Doris Lee +
Dr. Jan-Philip Gehrcke +
DriesS +
Dylan Percy
Erfan Nariman
Eric Leung
EricLeer +
Eve
Fangchen Li
Felix Divo
Florian Jetter
Fred Reiss
GFJ138 +
Gaurav Sheni +
Geoffrey B. Eisenbarth +
Gesa Stupperich +
Griffin Ansel +
Gustavo C. Maciel +
Heidi +
Henry +
Hung-Yi Wu +
Ian Ozsvald +
Irv Lustig
Isaac Chung +
Isaac Virshup
JHM Darbyshire (MBP) +
JHM Darbyshire (iMac) +
Jack Liu +
James Lamb +
Jeet Parekh
Jeff Reback
Jiezheng2018 +
Jody Klymak
Johan Kåhrström +
John McGuigan
Joris Van den Bossche
Jose
JoseNavy
Josh Dimarsky
Josh Friedlander
Joshua Klein +
Julia Signell
Julian Schnitzler +
Kaiqi Dong
Kasim Panjri +
Katie Smith +
Kelly +
Kenil +
Keppler, Kyle +
Kevin Sheppard
Khor Chean Wei +
Kiley Hewitt +
Larry Wong +
Lightyears +
Lucas Holtz +
Lucas Rodés-Guirao
Lucky Sivagurunathan +
Luis Pinto
Maciej Kos +
Marc Garcia
Marco Edward Gorelli +
Marco Gorelli
MarcoGorelli +
Mark Graham
Martin Dengler +
Martin Grigorov +
Marty Rudolf +
Matt Roeschke
Matthew Roeschke
Matthew Zeitlin
Max Bolingbroke
Maxim Ivanov
Maxim Kupfer +
Mayur +
MeeseeksMachine
Micael Jarniac
Michael Hsieh +
Michel de Ruiter +
Mike Roberts +
Miroslav Šedivý
Mohammad Jafar Mashhadi
Morisa Manzella +
Mortada Mehyar
Muktan +
Naveen Agrawal +
Noah
Nofar Mishraki +
Oleh Kozynets
Olga Matoula +
Oli +
Omar Afifi
Omer Ozarslan +
Owen Lamont +
Ozan Öğreden +
Pandas Development Team
Paolo Lammens
Parfait Gasana +
Patrick Hoefler
Paul McCarthy +
Paulo S. Costa +
Pav A
Peter
Pradyumna Rahul +
Punitvara +
QP Hou +
Rahul Chauhan
Rahul Sathanapalli
Richard Shadrach
Robert Bradshaw
Robin to Roxel
Rohit Gupta
Sam Purkis +
Samuel GIFFARD +
Sean M. Law +
Shahar Naveh +
ShaharNaveh +
Shiv Gupta +
Shrey Dixit +
Shudong Yang +
Simon Boehm +
Simon Hawkins
Sioned Baker +
Stefan Mejlgaard +
Steven Pitman +
Steven Schaerer +
Stéphane Guillou +
TLouf +
Tegar D Pratama +
Terji Petersen
Theodoros Nikolaou +
Thomas Dickson
Thomas Li
Thomas Smith
Thomas Yu +
ThomasBlauthQC +
Tim Hoffmann
Tom Augspurger
Torsten Wörtwein
Tyler Reddy
UrielMaD
Uwe L. Korn
Venaturum +
VirosaLi
Vladimir Podolskiy
Vyom Pathak +
WANG Aiyong
Waltteri Koskinen +
Wenjun Si +
William Ayd
Yeshwanth N +
Yuanhao Geng
Zito Relova +
aflah02 +
arredond +
attack68
cdknox +
chinggg +
fathomer +
ftrihardjo +
github-actions[bot] +
gunjan-solanki +
guru kiran
hasan-yaman
i-aki-y +
jbrockmendel
jmholzer +
jordi-crespo +
jotasi +
jreback
juliansmidek +
kylekeppler
lrepiton +
lucasrodes
maroth96 +
mikeronayne +
mlondschien
moink +
morrme
mschmookler +
mzeitlin11
na2 +
nofarmishraki +
partev
patrick
ptype
realead
rhshadrach
rlukevie +
rosagold +
saucoide +
sdementen +
shawnbrown
sstiijn +
stphnlyd +
sukriti1 +
taytzehao
theOehrly +
theodorju +
thordisstella +
tonyyyyip +
tsinggggg +
tushushu +
vangorade +
vladu +
wertha +