Robert Jaeger

673 days ago
Edited by Sophia B Liu , Robert Jaeger 673 days ago
#GeoViz Notes
Sophia L Proof-of-Concepts
  • Accessed the USGS data on minerals and environmental health...
  • Not presented in a common format
  • No real standard
Robert J
  • A lot of work to extract data and make it usable; pdf data may be scraped, however it is then without useful data delimiters. This requires semi-manual 'cleaning' of data sets prior to use in visualization or analysis tools.
Sophia L
  • Some are directly consumable but not all, it varies
  • No index, context, metadata file, no map of where the data is
  • Data not presented in the most consumable way
  • Linking to where the data came from like the spreadsheet
  • Suggesting an approach across the agency, department, or federal government
Robert J
  • Purpose of PDF is a semi platform agnostic presentation for human eyes, not for further digital analysis.
Sophia L
  • Want to access the lower level data and not only the high level stuff
  • Just pointing to the data that already exists to produce the PDF may be sufficient in many cases.
Robert J
  • Consumability of data is paramount
  • Overall, the large amount of high quality data is excellent. The challenge is 1. finding the data of interest; this includes high level site and file formatting, indices, etc. 2. consuming those data; this includes issues of 'inside the file' data formatting, ease of extraction into CSV or similar format, presence and quality of metadata, attributions etc, and potential for 'marking' or noting where data was found, for future reference; some datasets and individual elements may be linked to via URL, others not. 
  • Adoption of indexed, hierarchical standards enterprise wide at least, preferably federal government wide, would dramatically improve data usability for all consumers. This use of standardization would also be the most obvious metric to use to both demonstrate compliance to government open data requirements and thus block potential legal questions and accusations, as well as to enable and encourage increase in 'value of the USGS' brand on these data. These metric uses are clearly also the easiest and clearest ways for internal USGS elements to demonstrate value for professional uses. 
Data Issues that Arose during Hackathon
  • The "Site code" column often contains superscript values for footnotes. In Table 19 this causes problems. An additional row is included for a site when additional samples were collected, and the site code is superscripted with a "1".  :(

