Once data is read into Python, a first step is to analyze the data with summary statistics. This is especially true if the data set is large. Summary statistics include the count, mean, standard deviation, maximum, minimum, and quartile information for the data columns.

Run the next cell to:
n linearly spaced values betweeen 0 and n-1 with np.linspace(start,end,count)np.random.rand(count)np.random.normal(mean,std,count)time, x, and y with a vertical stack np.vstack and transpose .T for column oriented data.03-data.csv with header time,x,y.import numpy as np
np.random.seed(0)
n = 1000
time = np.linspace(0,n-1,n)
x = np.random.rand(n)
y = np.random.normal(1,1,n)
data = np.vstack((time,x,y)).T
np.savetxt('03-data.csv',data,header='time,x,y',delimiter=',',comments='')

The histogram is a preview of how to create graphics so that data can be evaluated visually. 04. Visualize shows how to create plots to analyze data.
import matplotlib.pyplot as plt
%matplotlib inline
plt.hist(x,10,label='x')
plt.hist(y,60,label='y',alpha=0.7)
plt.ylabel('Count'); plt.legend()
plt.show()

numpy¶The np.loadtxt function reads the CSV data file 03-data.csv. Numpy calculates size (dimensions), mean (average), std (standard deviation), and median as summary statistics. If you don't specify the axis then numpy gives a statistic across both the rows (axis=0) and columns (axis=1).
import numpy as np
data = np.loadtxt('03-data.csv',delimiter=',',skiprows=1)
print('Dimension (rows,columns):')
print(np.size(data,0),np.size(data,1))
print('Average:')
print(np.mean(data,axis=0))
print('Standard Deviation:')
print(np.std(data,0))
print('Median:')
print(np.median(data,0))

x*yskew of x*y with the scipy.stats skew function.

pandas¶Pandas simplifies the data analysis with the .describe() function that is a method of DataFrame that is created with pd.read_csv(). Note that the data file can either be a local file name or a web-address such as
url='https://apmonitor.com/pdc/uploads/Main/tclab_data2.txt'
data = pd.read_csv(url)
data.describe()
import pandas as pd
data = pd.read_csv('03-data.csv')
data.describe()


Generate a file from the TCLab data with seconds (t), heater levels (Q1 and Q2), and temperatures (lab.T1 and lab.T2). Record data every second for 120 seconds and change the heater levels every 30 seconds to a random number between 0 and 80 with np.random.randint(). There is no need to change this program, only run it for 2 minutes to collect the data. If you do not have a TCLab device, read a data file 1 from an online link.
import tclab, time, csv
import pandas as pd
import numpy as np
try:
# connect to TCLab if available
n = 120
with open('03-tclab1.csv',mode='w',newline='') as f:
cw = csv.writer(f)
cw.writerow(['Time','Q1','Q2','T1','T2'])
with tclab.TCLab() as lab:
print('t Q1 Q2 T1 T2')
for t in range(n):
if t%30==0:
Q1 = np.random.randint(0,81)
Q2 = np.random.randint(0,81)
lab.Q1(Q1); lab.Q2(Q2)
cw.writerow([t,Q1,Q2,lab.T1,lab.T2])
if t%5==0:
print(t,Q1,Q2,lab.T1,lab.T2)
time.sleep(1)
file = '03-tclab1.csv'
data1=pd.read_csv(file)
except:
print('No TCLab device found, reading online file')
url = 'http://apmonitor.com/do/uploads/Main/tclab_dyn_data2.txt'
data1=pd.read_csv(url)
Use requests to download a sample TCLab data file for the analysis. It is saved as 03-tclab2.csv.
import requests
import os
url = 'http://apmonitor.com/pdc/uploads/Main/tclab_data2.txt'
r = requests.get(url)
with open('03-tclab2.csv', 'wb') as f:
f.write(r.content)
print('File 03-tclab2.csv retrieved to current working directory: ')
print(os.getcwd())
Read the files 03-tclab1.csv and 03-tclab2.csv and display summary statistics for each with data.describe(). Use the summary statistics to compare the number of samples and differences in average and standard deviation value for T1 and T2.