如何从JSON中提取嵌套的值？

我有下面的代码, 输出中也需要状态和* policy工作流名称(即星期二), 任何人都可以帮忙。

import json
import requests
import pandas as pd
from pandas.io.json import json_normalize
import ast
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 100)

a = [{'attributes': [{'key': '*policy action jobid', 'values': ['289903']}, {'key': '*policy action name', 'values': ['backup']}, {'key': '*policy name', 'values': ['Daily_Backups']}, {'key': '*policy workflow name', 'values': ['tuesday']}, {'key': 'clone retention policy', 'values': ['   504:   5:  34']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:41+08:00', 'clientHostname': 'xyz.com', 'clientId': '7d391c52-00000004-5cda459d-5c1', 'creationTime': '2020-04-28T21:29:25+08:00', 'fileCount': 0, 'id': '1eb1', 'instances': [], 'level': 'Full', 'links': [{'href': 'https://iservera/backups/1ec1', 'rel': 'item'}], 'name': '/abc', 'retentionTime': '2020-05-19T23:57:41+08:00', 'saveTime': '2020-04-28T21:27:07+08:00', 'shortId': '2177', 'size': {'unit': 'Byte', 'value': 0}, 'type': 'File'}, {'attributes': [{'key': '*policy action jobid', 'values': ['2803']}, {'key': '*policy action name', 'values': ['backup: 1589']}, {'key': '*policy name', 'values': ['Daily_Backups: 159']}, {'key': '*policy workflow name', 'values': ['tuesday: 1588079529']}, {'key': '*ss clone retention', 'values': ['          1588079529:          1588079590:   1824409']}, {'key': 'group', 'values': ['tuesday']}, {'key': 'saveset features', 'values': ['CLIENT_SAVETIME']}], 'browseTime': '2020-05-19T23:57:42+08:00', 'clientHostname': 'abc.com', 'clientId': 'ec3dc1', 'completionTime': '2020-04-28T21:29:47+08:00', 'creationTime': '2020-04-28T21:13:10+08:00', 'fileCount': 0, 'id': 'cc1', 'instances': [{'clone': False, 'id': '1588079529', 'status': 'Aborted', 'volumeIds': ['245614341']}], 'level': 'Full', 'links': [{'href': 'https://abc/backups/c771', 'rel': 'item'}], 'name': '/xyz', 'retentionTime': '2020-05-19T23:57:42+08:00', 'saveTime': '2020-04-28T21:10:53+08:00', 'shortId': '2141727718', 'size': {'unit': 'Byte', 'value': 36264099844}, 'type': 'NDMP'}]
df = json_normalize(a)
a = df[['clientHostname', 'completionTime', 'size.value', 'type', 'fileCount']]
print(a)```

Current output is-
```  clientHostname             completionTime   size.value  type  fileCount
0        xyz.com                        NaN            0  File          0
1        abc.com  2020-04-28T21:29:47+08:00  36264099844  NDMP          0```

Output needed is-
  ```clientHostname             completionTime   size.value  type  fileCount status  Policy
0        xyz.com                        NaN            0  File          0   -      tuesday
1        abc.com  2020-04-28T21:29:47+08:00  36264099844  NDMP          0  Aborted tuesday```

#1

你可以获取策略工作流名称为df [‘attributes’] [0] [3] [‘key’], df [‘attributes’] [0] [3] [‘values’]和状态为df [‘instances ‘] [i] [0] [‘status’], 其中i是记录号。

#2

我将使用库jmespath遍历json数据：

要访问密钥, 请使用。要访问列表, 请使用[]表示法。

import jmespath
expression = jmespath.compile("""
                              [].
                               {clientHostname:clientHostname, completionTime:completionTime, "size.value":size.value, type:type, fileCount:fileCount, status:instances[].status, Policy:attributes[?key==`*policy workflow name`].values[]}
                               """)
res = expression.search(a)
res

[{'clientHostname': 'xyz.com', 'completionTime': None, 'size.value': 0, 'type': 'File', 'fileCount': 0, 'status': [], 'Policy': ['tuesday']}, {'clientHostname': 'abc.com', 'completionTime': '2020-04-28T21:29:47+08:00', 'size.value': 36264099844, 'type': 'NDMP', 'fileCount': 0, 'status': ['Aborted'], 'Policy': ['tuesday: 1588079529']}]

进行一些清理以适合你的用例：

#get the day of the week for policy keys
res = [{key:value[0].split(':')[0] 
       if key=="Policy" else value 
       for key, value in ent.items()} for ent in res]

#if list is empty, replace it with "-"
res = [{key:"-" 
       if (isinstance(value, list) and not value) else value 
       for key, value in ent.items()} for ent in res]

 pd.DataFrame(res)

clientHostname  completionTime  size.value  type    fileCount   status  Policy
0   xyz.com None    0   File    0   -   tuesday
1   abc.com 2020-04-28T21:29:47+08:00   36264099844 NDMP    0   [Aborted]   tuesday

你可能需要查看数据, 看看是否需要进行其他转换

来源：

https://www.srcmini02.com/57972.html

#1

#2

猜你喜欢