[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[python #HGQ-627384]: Python Coding Inquiry


Thanks for taking the time to e-mail us with your Python question, we are 
definitely here to help! And thanks for your providing all of this information 
and code for me to help out. First off, I want to show you a little bit of a 
super shortcut if you want to take some time to learn the very useful Pandas 
Python package (https://pandas.pydata.org/). If you have Pandas installed with 
pip or conda like you might any other package, in just a few lines we can have 
your whole file read in to an easy-to-read and -use Dataframe:

  from urllib.request import urlopen
  import numpy as np
  import pandas as pd

  # Since you know what data you're looking at here, let's go ahead and specify 
the names we want the columns to have
  data_columns = ['Year', 'Month', 'Day', 'Hour', 'Minute', 'Second', 'Speed', 

  with urlopen('http://www.oswego.edu/met_class/tower/WND202003') as data:
      # We will use the Pandas read_fwf to read in our fixed-width data from 
this page into a Dataframe
      df = pd.read_fwf(data, names=data_columns)

from which you'll get back something that looks like attachment 1.

I will also show you some code I used to create an array by hand like you've 
sent me. You were close! The only error I ran into was that there are a few 
'/////' no-data entries that we can't convert to float() in your code. Here's 
what I did:

  from urllib.request import urlopen
  import numpy as np

  with urlopen('http://www.oswego.edu/met_class/tower/WND202003') as data:
      data_str = data.readlines()

  # Like before, let's scribble down our column headers, and this can help us 
make an array to store everything
  data_columns = ['Year', 'Month', 'Day', 'Hour', 'Minute', 'Second', 'Speed', 

  # Let's make an array with columns for each of our individual entries from 
the file,
  # as well as rows for each line we read in
  data_array = np.zeros((len(data_str), len(data_columns)))

  # Now, like you did, we can loop through the lines
  for line in range(len(data_str)):
      # For each line we will go ahead and split the line into a list of 
individually decoded entries
      items = data_str[line].decode('UTF-8').split()
      # Now for every line we can loop through each of our split entries to 
place place them into a column of our data_array
      for column in range(len(data_columns)):
          # I found in your data that a small number of lines have "no data" 
entries as '/////' or '//////'
          # so we will assign placeholder number -999 as our N/A for those 
          if '/////' in str(items[column]):
              data_array[line, column] = -999.
          # if we don't see this non-number entry, we simply store the read-in 
number in our array
              data_array[line, column] = items[column]

and from that I get back something that looks like attachment 2 for every row 
of this data_array. I hope this helps with your question!

All the best,


> Good afternoon,
> Iâm emailing today because I am having some
> trouble with a python coding project and I was recommended to contact this
> email from Mr. Ryan May for any python coding questions and I would greatly
> appreciate any and all help I can get. I have three goals with the code and
> they are as follows: 1. Have my code read lines of data on a website (I
> already got that part) 2. Have the code find a line with specific numbers,
> and 3. Print out that specific line of data. The issue I'm having as of now
> is I can't get the program to read the data the way I want it to (2020 03
> 01 is a part of a line and i want the program to read 2020 as one column
> that is the year, 03 being the second column being the month, and 01 being
> the third column being the day etc...). Here is the code as well as a
> screenshot of the code in jupyter notebooks.
> Thank you!
> from urllib.request import urlopen
> with urlopen('http://www.oswego.edu/met_class/tower/WND202003') as data:
> for line in data:
> line = line.decode('utf-8')
> if '2020' in line:
> print(line)
> import numpy as N
> data_str = data.readlines()
> year = N.zeros(len(data_str), 'f')
> month = N.zeros(len(data_str), 'f')
> day = N.zeros(len(data_str), 'f')
> hour = N.zeros(len(data_str), 'f')
> minute = N.zeros(len(data_str), 'f')
> second = N.zeros(len(data_str), 'f')
> speed = N.zeros(len(data_str), 'f')
> direction = N.zeros(len(data_str), 'f')
> for i in range(len(data_str)):
> split_istr = data_str[i].split('\t')
> year[i] = float(split_istr[0])
> month[i] = float(split_istr[1])
> day[i] = float(split_istr[2])
> hour[i] = float(split_istr[3])
> minute[i] = float(split_istr[4])
> second[i] = float(split_istr[5])
> speed[i] = float(split_istr[6])
> direction[i] = float(split_istr[7])

Ticket Details
Ticket ID: HGQ-627384
Department: Support Python
Priority: Low
Status: Closed
NOTE: All email exchanges with Unidata User Support are recorded in the Unidata 
inquiry tracking system and then made publicly available through the web.  If 
you do not want to have your interactions made available in this way, you must 
let us know in each email you send to us.

Attachment: Screen Shot 2020-04-09 at 1.54.51 PM.png
Description: PNG image

Attachment: Screen Shot 2020-04-09 at 2.06.27 PM.png
Description: PNG image