pandas

The input file Returns.csv

Not counting the header line, the input file has 33 lines. Each of the three dates appears on 11 lines. The only fields we will use are
field 0: Date
field 2: Long Exp (exposure)
field 3: Long Perf (performance)

Date,Sector,Long Exp,Long Perf,Short Exp,Short Perf,,,,,
1/1/04,Consumer Discretionary,299192447.68,-299192.45,653769973.85,-653769.97,,,,,
1/1/04,Consumer Staples,806817013.88,-806817.01,155938778.63,-155938.78,,,,,
1/1/04,Energy,495331847.14,-495331.85,331916571.89,331916.57,,,,,
1/1/04,Financials,438551829.34,438551.83,375651108.20,213241.20,,,,,
1/1/04,Health Care,353562123.03,-353562.12,434401773.16,434401.77,,,,,
1/1/04,Industrials,85710566.66,85710.57,620244575.40,127634.20,,,,,
1/1/04,Information Technology,731335638.19,731335.64,513751410.08,-513751.41,,,,,
1/1/04,Materials,606915176.25,606915.18,130279981.29,342723.20,,,,,
1/1/04,Real Estate,690597304.21,452367.23,836765311.59,-836765.31,,,,,
1/1/04,Telecommunication Services,432774295.12,234872.30,542978125.25,542978.13,,,,,
1/1/04,Utilities,577593316.82,577593.32,335109028.81,-335109.03,,,,,
1/2/14,Consumer Discretionary,298893255.23,-125425.22,654423743.83,-134607.22,,,,,
1/2/14,Consumer Staples,806010196.87,-588526.41,156094717.41,-127539.85,,,,,
1/2/14,Energy,494836515.30,-262532.72,331584655.32,280881.29,,,,,
1/2/14,Financials,438990381.17,57815.38,375437867.00,86311.46,,,,,
1/2/14,Health Care,353208560.91,-214415.23,433967371.39,193435.12,,,,,
1/2/14,Industrials,85796277.22,1955.27,620116941.20,126538.15,,,,,
1/2/14,Information Technology,732066973.82,631985.26,514265161.49,-105652.38,,,,,
1/2/14,Materials,607522091.43,563465.03,129937258.09,169649.01,,,,,
1/2/14,Real Estate,691049671.44,424365.40,837602076.91,-371225.04,,,,,
1/2/14,Telecommunication Services,433009167.42,101597.69,542435147.13,408266.07,,,,,
1/2/14,Utilities,578170910.14,515323.44,335444137.84,-327443.68,,,,,
1/3/14,Consumer Discretionary,298767830.02,-77320.87,654558351.05,-8764.87,,,,,
1/3/14,Consumer Staples,805421670.46,-154150.06,156222257.26,-9682.94,,,,,
1/3/14,Energy,494573982.58,-34813.51,331303774.02,25961.21,,,,,
1/3/14,Financials,439048196.55,38058.06,375351555.53,23288.89,,,,,
1/3/14,Health Care,352994145.67,-173723.40,433773936.27,117726.12,,,,,
1/3/14,Industrials,85798232.49,1367.79,619990403.05,10269.29,,,,,
1/3/14,Information Technology,732698959.08,627507.50,514370813.88,-6261.26,,,,,
1/3/14,Materials,608085556.46,376862.73,129767609.08,88234.13,,,,,
1/3/14,Real Estate,691474036.84,199717.67,837973301.95,-68413.24,,,,,
1/3/14,Telecommunication Services,433110765.11,50143.34,542026881.06,402088.95,,,,,
1/3/14,Utilities,578686233.57,209828.94,335771581.52,-174046.48,,,,,

Pandas

A student rhapsodized about the power of the two yellow statements.

Do the same thing without pandas.

Lines 19–26 of pandas.py will do the work of the two yellow statements, using only the machinery covered in this course. First, install these three packages. (Microsoft people have to say py.exe -m pip instead of pip3.)

pip3 install pandas
pip3 install matplotlib
pip3 install openpyxl

pip3 list

The input file contains 33 lines of data, not counting the header line. That’s why the list lines in line 51 contains 33 items. Each of these items is a tuple containing three items: a datetime.date and two numbers. (We saw class datetime.date in Class Date.) For example, the first three tuples in lines are

(2004-01-01, 299192447.68, -299192.45)
(2004-01-01, 806817013.88, -806817.01)
(2004-01-01, 495331847.14, -495331.85)

The input file contains only three different dates:
January 1, 2004
January 2, 2014
January 3, 2014
That’s why the list dates in line 57 contains only three items.

The list returned by the sorted in line 64 contains 11 items, but the highestExps list in that line contains only the last n items because of the index [-n:] which we saw in Indexing. Ditto for line 75.

pandas.py

1.538153e+09 means
1.538153 × 10 × 10 × 10 × 10 × 10 × 10 × 10 × 10 × 10 = 1.538153 × 109 = 1,538,153,000
In other words, move the decimal point nine places to the right. The e stands for exponent.

top3LongExp =
Date
2004-01-01    1.538153e+09
2014-01-02    1.538077e+09
2014-01-03    1.538121e+09
Name: Long Exp, dtype: float64

top3LongPerf =
Date
2004-01-01    -75481.37
2014-01-02     43458.85
2014-01-03    473357.44
dtype: float64

my top3LongExp =
2004-01-01    1.538153e+09
2014-01-02    1.538077e+09
2014-01-03    1.538121e+09

my top3LongPerf =
2004-01-01    -75481.37
2014-01-02     43458.85
2014-01-03    473357.44