DataFrame和python中数据结构互相转换

摘要:
楔子有时候DataFrame,我们不一定要保存成文件、或者入数据库,而是希望保存成其它的格式,比如字典、列表、json等等。当然,读取DataFrame也不一定非要从文件、或者数据库,根据现有的数据生成DataFrame也是可以的,那么该怎么做呢?并且我们看到生成json默认跟索引没啥关系,所以不需要、也不可以加index=Falseorient='index'importpandasaspddf=pd.DataFrameprint"""{"0":{"name":"mashiro","age":17},"1":{"name":"satori","age":17},"2":{"name":"koishi","age":16},"3":{"name":"nagisa","age":21}}"""类似于records,只不过这里把字典作为value放在了外层字典里,其中key为对应的索引。

楔子

有时候DataFrame,我们不一定要保存成文件、或者入数据库,而是希望保存成其它的格式,比如字典、列表、json等等。当然,读取DataFrame也不一定非要从文件、或者数据库,根据现有的数据生成DataFrame也是可以的,那么该怎么做呢?我们来看一下

DataFrame转成python中的数据格式

转成json

DataFrame转成json,可以使用df.to_json()方法

import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json())  
# {"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}}

我们看到虽然转化成了json,但是有些不完美,那就是它把索引也算进去了

import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

# 如果不想加索引的话,那么指定index=False即可
try:
    print(df.to_json(index=False))
except Exception as e:
    print(e)  # 'index=False' is only valid when 'orient' is 'split' or 'table'
# 但是它报错了,说如果index=False,那么orient必须指定我split或者table

我们看一下这个orient是什么

首先orient可以有如下取值:split、records、index、columns、values、table

我们分别演示一下,看看orient取不同的值,结果会有什么变化

  • orient='split'
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json(orient="split"))
"""
{
 "columns":["name","age"],
 "index":[0,1,2,3],
 "data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
}
"""
print(df.to_json(orient="split", index=False))
"""
{
 "columns":["name","age"],
 "data":[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
}
"""

我们看到会变成三个键值对,分别是列名、索引、数据

  • orient='records'
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json(orient="records"))
"""
[{"name":"mashiro","age":17},
 {"name":"satori","age":17},
 {"name":"koishi","age":16},
 {"name":"nagisa","age":21}]
"""

这种格式的数据是比较常用的,相当于列名和每一行数据组合成一个字典,然后存在一个列表里面。并且我们看到生成json默认跟索引没啥关系,所以不需要、也不可以加index=False

  • orient='index'
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json(orient="index"))
"""
{
 "0":{"name":"mashiro","age":17},
 "1":{"name":"satori","age":17},
 "2":{"name":"koishi","age":16},
 "3":{"name":"nagisa","age":21}
}
"""

类似于records,只不过这里把字典作为value放在了外层字典里,其中key为对应的索引。当然这里同样不可以加index=False

  • orient='columns'
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json(orient="columns"))
"""
{"name":{"0":"mashiro","1":"satori","2":"koishi","3":"nagisa"},"age":{"0":17,"1":17,"2":16,"3":21}}
"""

我们看到这个和不指定orient得到结果是一样的,其实不指定的话orient默认是columns

  • orient=values
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

print(df.to_json(orient="values"))
"""
[["mashiro",17],["satori",17],["koishi",16],["nagisa",21]]
"""
# 我们看到当orient指定为values,会只获取数据
# 另外这个方式类似于to_numpy
print(df.to_numpy())
"""
[['mashiro' 17]
 ['satori' 17]
 ['koishi' 16]
 ['nagisa' 21]]
"""
  • orient=table
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

# 以数据库二维表的形式返回
print(df.to_json(orient="table"))
"""
{
    "schema": {
        "fields": [{"name": "index", "type": "integer"},
                   {"name": "name", "type": "string"},
                   {"name": "age", "type": "integer"}],
        "primaryKey": ["index"],
        "pandas_version": "0.20.0"
    },
    "data": [{"index": 0, "name": "mashiro", "age": 17},
             {"index": 1, "name": "satori", "age": 17},
             {"index": 2, "name": "koishi", "age": 16},
             {"index": 3, "name": "nagisa", "age": 21}]
}
"""
print(df.to_json(orient="table", index=False))
"""
{
    "schema": {
        "fields": [{"name": "name", "type": "string"},
                   {"name": "age", "type": "integer"}],
        "pandas_version": "0.20.0"
    },
    "data": [{"name": "mashiro", "age": 17},
             {"name": "satori", "age": 17},
             {"name": "koishi", "age": 16},
             {"name": "nagisa", "age": 21}]
}
"""

转成dict

DataFrame也可以转成字典,转换成字典里面也有一个orient参数,里面有一部分和to_json是类似的。因为json这个数据结构本身就借鉴了python中的字典,是的你没有看错,json这种数据结构参考了python中的字典。

to_dict中的orient可以有如下取值:dict、list、series、split、records、index,默认是dict

  • orient='dict'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

pprint(df.to_dict(orient="dict"))
"""
{'age': {0: 17, 1: 17, 2: 16, 3: 21},
 'name': {0: 'mashiro', 1: 'satori', 2: 'koishi', 3: 'nagisa'}}
"""
  • orient='list'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

pprint(df.to_dict(orient="list"))
"""
{'age': [17, 17, 16, 21], 'name': ['mashiro', 'satori', 'koishi', 'nagisa']}
"""
  • orient='series'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

# 这种结构真的不常用,就是一个key对应一个series
pprint(df.to_dict(orient="series"))
"""
{'age': 
0    17
1    17
2    16
3    21
Name: age, dtype: int64,

'name': 0    mashiro
1     satori
2     koishi
3     nagisa
Name: name, dtype: object}
"""
  • orient='split'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

pprint(df.to_dict(orient="split"))
"""
{'columns': ['name', 'age'],
 'data': [['mashiro', 17], ['satori', 17], ['koishi', 16], ['nagisa', 21]],
 'index': [0, 1, 2, 3]}
"""
  • orient='records'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

pprint(df.to_dict(orient="records"))
"""
[{'age': 17, 'name': 'mashiro'},
 {'age': 17, 'name': 'satori'},
 {'age': 16, 'name': 'koishi'},
 {'age': 21, 'name': 'nagisa'}]
"""
  • orient='index'
from pprint import pprint
import pandas as pd

df = pd.DataFrame({"name": ["mashiro", "satori", "koishi", "nagisa"],
                   "age": [17, 17, 16, 21]})

pprint(df.to_dict(orient="index"))
"""
{0: {'age': 17, 'name': 'mashiro'},
 1: {'age': 17, 'name': 'satori'},
 2: {'age': 16, 'name': 'koishi'},
 3: {'age': 21, 'name': 'nagisa'}}
"""

python中的数据格式转成DataFrame

字典转成DataFrame

import pandas as pd

data = {0: {'age': 17, 'name': 'mashiro'},
        1: {'age': 17, 'name': 'satori'},
        2: {'age': 16, 'name': 'koishi'},
        3: {'age': 21, 'name': 'nagisa'}}

df = pd.DataFrame.from_dict(data)
# 显然不是我们期待的格式
print(df)
"""
            0       1       2       3
age        17      17      16      21
name  mashiro  satori  koishi  nagisa
"""

df = pd.DataFrame.from_dict(data, orient="index")
print(df)
"""
   age     name
0   17  mashiro
1   17   satori
2   16   koishi
3   21   nagisa
"""

所以df.to_dict和pd.DataFrame.from_json实现的是相反的功能,但是from_dict中的orient参数只有两种选择,要么是index,要么是columns,默认是columns

from_records

from_records是专门针对外层是列表的数据

import pandas as pd

data = [{'age': 17, 'name': 'mashiro'},
        {'age': 17, 'name': 'satori'},
        {'age': 16, 'name': 'koishi'},
        {'age': 21, 'name': 'nagisa'}]

df = pd.DataFrame.from_records(data)
print(df)
"""
   age     name
0   17  mashiro
1   17   satori
2   16   koishi
3   21   nagisa
"""

其实这种数据就是to_dict(orient="records")生成的

免责声明:文章转载自《DataFrame和python中数据结构互相转换》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇从零开始邮件服务器搭建C#四舍五入下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

python处理xml大文件[xml.sax]

博客已迁移, 新地址 ===================== 之前使用过python使用dom读取简单配置xml文件的http://blog.csdn.net/wklken/article/details/7270117 今天遇到大文件处理,使用dom占用资源太多,改用sax处理 dom和sax区别可以自己google下 需求:读取xml数据文件,文...

pyinstaller打包python源程序访问hive

1.需求   使用hvie server一段时间后,业务部门需要自己不定时的查询业务数据,之前这一块都是他们提需求我们来做,后来发现这样重复一样的工作放在我们这边做是在没有效率,遂提出给他们工具或者web UI自助查询,当然hive有自己的hwi可以通过网页UI进行自助查询,但是这对不懂sql的业务人员有点不太友好,目前有没时间去修改hwi的UI,所以还是...

用python调用caffe时出错:AttributeError: 'module' object has no attribute 'bool_'

由于用caffe的时候需要将/somepath/your_caffe/python include进来作为环境变量,但是caffe中有个io.py和numpy的io冲突,所以导致这种现象。 下面给出了一种解决方法,即对有冲突的io文件进行重命名: numpyのioとPyCaffeのio.pyが競合するようです。(Strange Issue using Py...

python(leetcode)-350两个数组的交集

给定两个数组,编写一个函数来计算它们的交集。 示例 1: 输入: nums1 = [1,2,2,1], nums2 = [2,2] 输出: [2,2] 示例 2: 输入: nums1 = [4,9,5], nums2 = [9,4,9,8,4] 输出: [4,9] 说明: 输出结果中每个元素出现的次数,应与元素在两个数组中出...

VUE+Flask登录的初探--前端(Vue+element+axios)+后端(Flask+FlaskLogin+JWT)

0.前端部分依然基于VueCLI (https://cli.vuejs.org/zh/) 1.创建hello-login文件夹,然后再此文件夹内执行 vue create front-end ,一顿狂回车后,如下图所示:  2.安装elementUI,axios,js-cookie,qs  2.1  npm i element-ui -S  (https...

python 判断当前时间是否在一个时间范围内

一、概述 最近在数据分析,需要判断当前时间是否为上班时间:9:00~18:00 二、代码实现 import datetime # 范围时间 d_time = datetime.datetime.strptime(str(datetime.datetime.now().date()) + '9:00', '%Y-%m-%d%H:%M') d_time1...