pandas – mihonak

CSVファイルの半角スペースだけが入っている部分を何もない状態にする必要があり、変換するコードを書きました。

'c1','c2','c3'
1,2,3
1,' ',3
1,2,3

こうなっているのを、

'c1','c2','c3'
1,2,3
1,,3
1,2,3

こんなふうに置換したいのです。

標準ライブラリのcsvにreaderという関数があり、skipinitialspaceというパラメータがデフォルトでfalseになっているので、これをTrueに指定すると、スペースを無視して取り込むことができます。

import csv

csv_file = open('sample.csv', 'r', encoding='cp932', errors='', newline='')
f = csv.reader(csv_file, delimiter=',', doublequote=True, lineterminator='\r\n', quotechar="'", skipinitialspace=True)
newfile = open('sample_nospaces.csv', 'w', encoding='cp932')
writer = csv.writer(newfile, doublequote=True, lineterminator='\n')
header = next(f)
writer.writerow(header)
for row in f:
    row = row
    writer.writerow(row)
csv_file.close()

ちなみに、pandasのisin()を使うと、該当箇所を特定することができます。

import pandas as pd

df = pd.DataFrame([
    [1,2,3],
    [1,' ',3],
    [1,2,3]
], columns=['c1','c2','c3'])

print(df.isin([' ']))

#      c1     c2     c3
# 0  False  False  False
# 1  False   True  False
# 2  False  False  False

下記のようにすると該当箇所だけを抽出することができ、件数のカウント等に利用できます。

index = df[df.isin([' '])].dropna(how='all').index
print(df.iloc[index])

#    c1 c2  c3
# 1   1      3

カテゴリー: pandas

PythonでCSVファイルの余計なスペースを削除する方法