ktplotspy绘制CellphoneDB结果弦图时tmpdf.index = complex_id处出现ValueError: Length mismatch

4.2k 词

报错内容

完整报错示例
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[48], line 1
----> 1 kpy.plot_cpdb_chord(
2 adata=adata_cpdb,
3 cell_type1=".",
4 cell_type2='.',
5 means=means,
6 pvals=pvals,
7 deconvoluted=decon,
8 celltype_key="celltype",
9 link_offset=1,
11 )

File ~/anaconda3/envs/jupyterlab/lib/python3.10/site-packages/ktplotspy/plot/plot_cpdb_chord.py:170, in plot_cpdb_chord(adata, means, pvals, deconvoluted, celltype_key, interaction, cell_type1, cell_type2, keep_celltypes, remove_self, layer, sector_colors, sector_text_kwargs, sector_radius_limit, equal_sector_size, same_producer_colors, link_colors, link_offset, link_kwargs, legend_width, legend_kwargs, **plot_cpdb_kwargs)
168 simple_2[i] = re.sub(partner_2[i] + "_|_" + partner_2[i], "", simple_2[i])
169 tmpdf = pd.concat([pd.DataFrame(zip(simple_1, partner_1)), pd.DataFrame(zip(partner_2, simple_2))])
--> 170 tmpdf.index = complex_id
171 tmpdf.columns = ["id_a", "id_b"]
172 _interactions_subset = pd.concat([_interactions_subset, tmpdf], axis=1)

File ~/anaconda3/envs/jupyterlab/lib/python3.10/site-packages/pandas/core/generic.py:6313, in NDFrame.__setattr__(self, name, value)
6311 try:
6312 object.__getattribute__(self, name)
-> 6313 return object.__setattr__(self, name, value)
6314 except AttributeError:
6315 pass

File properties.pyx:69, in pandas._libs.properties.AxisProperty.__set__()

File ~/anaconda3/envs/jupyterlab/lib/python3.10/site-packages/pandas/core/generic.py:814, in NDFrame._set_axis(self, axis, labels)
809 """
810 This is called from the cython code when we set the `index` attribute
811 directly, e.g. `series.index = [1, 2, 3]`.
812 """
813 labels = ensure_index(labels)
--> 814 self._mgr.set_axis(axis, labels)
815 self._clear_item_cache()

File ~/anaconda3/envs/jupyterlab/lib/python3.10/site-packages/pandas/core/internals/managers.py:238, in BaseBlockManager.set_axis(self, axis, new_labels)
236 def set_axis(self, axis: AxisInt, new_labels: Index) -> None:
237 # Caller is responsible for ensuring we have an Index object.
--> 238 self._validate_set_axis(axis, new_labels)
239 self.axes[axis] = new_labels

File ~/anaconda3/envs/jupyterlab/lib/python3.10/site-packages/pandas/core/internals/base.py:98, in DataManager._validate_set_axis(self, axis, new_labels)
95 pass
97 elif new_len != old_len:
---> 98 raise ValueError(
99 f"Length mismatch: Expected axis has {old_len} elements, new "
100 f"values have {new_len} elements"
101 )

ValueError: Length mismatch: Expected axis has 86 elements, new values have 78 elements

问题分析

complex_id的长度小于tmpdf.index的长度。

源码根据interacting_pair字段将相互作用划分到complex_idsimple_id中,标准是这个字段里下划线的数量。

然后问题出在分隔出来的地方

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
_interactions_subset = interactions_subset.loc[complex_id].copy()
_interactions_subset_simp = interactions_subset.loc[simple_id].copy()


# 问题
complex_idx1 = [i for i, j in _interactions_subset.partner_b.items() if re.search("complex:", j)]
complex_idx2 = [i for i, j in _interactions_subset.partner_a.items() if re.search("complex:", j)]
# 问题


# complex_idx
simple_1 = list(_interactions_subset.loc[complex_idx1, "interacting_pair"])
simple_2 = list(_interactions_subset.loc[complex_idx2, "interacting_pair"])
partner_1 = [re.sub("complex:", "", b) for b in _interactions_subset.loc[complex_idx1, "partner_b"]]
partner_2 = [re.sub("complex:", "", a) for a in _interactions_subset.loc[complex_idx2, "partner_a"]]
for i, _ in enumerate(simple_1):
simple_1[i] = re.sub(partner_1[i] + "_|_" + partner_1[i], "", simple_1[i])
for i, _ in enumerate(simple_2):
simple_2[i] = re.sub(partner_2[i] + "_|_" + partner_2[i], "", simple_2[i])
tmpdf = pd.concat([pd.DataFrame(zip(simple_1, partner_1)), pd.DataFrame(zip(partner_2, simple_2))])
tmpdf.index = complex_id

在这个CellphoneDB结果里,partner_apartner_b有同时包含complex:的情况,此时会有相同的行被划分到complex_idx1complex_idx2中,从而导致tmpdf中有重复的部分,行数大于complex_id的长度。

解决方案

在作者修好之前,可以暂时先在取complex_idx2的时候,不选complex_idx1中存在的行。

1
2
3
4
complex_idx2 = [
i for i, j in _interactions_subset.partner_a.items()
if re.search("complex:", j) and i not in complex_idx1
]