当前位置:在线查询网 > 在线百科全书查询 > 基于开源工具的数据分析

基于开源工具的数据分析_在线百科全书查询


请输入要查询的词条内容:

基于开源工具的数据分析




图书信息


作者:Philipp K. Janert

出版社: 东南大学出版社; 第1版 (2011年6月1日)

外文书名: O’Reilly Media,Inc.

平装: 509页

正文语种: 英语

开本: 16

ISBN: 7564126744, 9787564126742

条形码: 9787564126742

产品尺寸及重量: 23 x 17.6 x 2.4 cm ; 839 g

内容简介


数据收集相对比较简单,而要把原始信息转化为有用的数据则需要知道如何精确地抽取你想要的内容。通过这《基于开源工具的数据分析(影印版)》(作者Philipp K.Janert)的深入讲解,那些对数据分析感兴趣的中等或者富有经验的程序员将可以学习到在商业环境中与数据打交道的技术。你将了解到如何观察数据来找出它所包含的信息,如何在概念模型里捕捉到这些想法,然后把你的理解通过商业计划、度量标准的精确报告和其他方式反馈给你所在的机构。

你将会通过《基于开源工具的数据分析(影印版)》每章结束部分的动手实践来慢慢体验各种概念。最重要的是,你将了解到如何思考你所希望获取的数据——而不是依赖于工具来替你思考。

编辑推荐


《基于开源工具的数据分析(影印版)》(作者Philipp K.Janert)使用图形来描述带有一个、两个或者十多个变量的数据;使用粗略计算以及维度和概率参数来开发概念模型;使用诸如模拟和聚类的集约计算方法来挖掘数据;通过报告、信息板和其他度量程序来让你的结论更容易理解;理解财务计算,包括货币时间价值;利用降维技术或者预测分析来克服数据分析过程中面临的挑战;熟悉数据分析的不同开源编程环境。

目录


PREFACE

1 INTRODUCTION

Data Analysis

What''s in This Book

What''s with the Workshops?

What''s with the Math?

What You''ll Need

What''s Missing

PART I Graphics: Looking at Data

2 A SINGLE VARIABLE: SHAPE AND DISTRIBUTION

Dot andJitter Plots

Histograms and Kernel Density Estimates

The Cumu/atiue Distribution Function

Rank-Order Plots and Lilt Charts

Only When Appropriate: Summary Statistics and Box Plots

Workshop: NumPy

Further Reading

3 TWO VARIABLES: ESTABLISHING RELATIONSHIPS

Scatter Plots

Conquering Noise: 5moothing

Logarithmic Plots

Banking

Linear ReRression and All That

Shouwing What''s Important

Graphical Analysis and Presentation Graphics

Workshop: matplotlib

Further Reading

TIME AS A VARIABLE: TIME-SERIES ANALYSIS

Examples

The Task

Smoothing

Don''t Ouerlook the Obuious!

The Correlation Function

Optional: Filters and Conuolutions

Workshop: scipy.signal

Further ReadinR

5 MORE THAN TWO VARIABLES: GRAPHICAL MULTIVARIATE ANALYSIS

False-Color Plots

A Lot at a Glance: Multiplots

Composition Problems

Nouel Plot Types

Interactiue Explorations

Workshop: Tools for Multiuariate Graphics

Further ReadinR

6 INTERMEZZO: A DATA ANALYSIS SESSION

A Data Analysis Session

Workshop: gnuplot

Further ReadinR

PART II Analyticg: Modeling Data

7 GUESSTIMATION AND THE BACK OF THE ENVELOPE

Principles of Guesstimation

How Good Are Those Numbers?

Optional: A Closer Look at Perturbation Theory and

Error PropaRation

Workshop: The Gnu Scientific Library (GSL)

Further Reading

8 MODELS FROM SCALING ARGUMENTS

Models

ArRuments from Scale

Mean-Field Approximations

Common Time-Euolution Scenarios

Case Study: How Many Seruers Are Best?

Why Modeling?

Workshop: Sage

Further Reading

9 ARGUMENTS FROM PROBABILITY MODELS

The. Binomial Distribution and Bernoulli Trials

The Gaussian Distribution and the Central Limit Theorem

Power-Law Distributions and Non-Normal Statistics

Other Distributions

Optional: Case Study--Unique Visitors ouer Time

Workshop: Power-Law Distributions

Further Reading

10 WHAT YOU REALLY NEED TO KNOW ABOUT CLASSICAL STATISTICS

Genesis

Statistics Defined

Statistics Explained

Controlled Experiments Versus Obseruationa} Studies

Optional: Bayesian Statistics--The Other Point of View

Workshop: R

Further Reading

11 INTERMEZZO:MYTHBUSTING--BIGFOOT, LEAST SQUARES, AND ALL THAT

How to Auerage Auerages

The Standard Deuiation

Least Squares

Further Reading

PART III Computation: Mininhg Data

12 SIMULATIONS

A Warm-Up Question

Monte Carlo Simulations

Resampling Methods

Workshop: Discrete Euent Simulations with Simpy

Further Reading

13 FINDING CLUSTERS

What Constitutes a Cluster?

Distance and Similarity Measures

Clustering Methods

Pre-and Postprocessing

Other ThouRhts

A Special Case: Market BasketAnalysis

A Word of WarninR

Workshop: P/cluster and the C Clustering Library

Further Reading

14 SEEING THE FOREST FOR THE TREES: FINDING

IMPORTANT ATTRIBUTES

Principal Component Analysis

Visual Techniques

Kohonen Maps

Workshop: PCA with R

Further Readin2

15 INTERMEZZO:WHEN MORE IS DIFFERENT

A Horror Story

Some Suggestions

What About Map/Reduce?

Workshop: Generating Permutations

Further Reading

PART IV Applications: Using Data

16 REPORTING, BUSINESS INTELLIGENCE, AND DASHBOARDS

Business Intelligence

Corporate Metrics and Dashboards

Data Quality Issues

Workshop: Berkeley DB and SQLite

Further Reading

17 FINANCIAL CALCULATIONS AND MODELING

The Time Value o[ Money

Uncertainty in Planning and Opportunity Costs

Cost Concepts and Depreciation

Should You Care?

Is This All That Matters?

Workshop: The Newsuendor Problem

Further Reading

18 PREDICTIVE ANALYTICS

Introduction

Some Classification Terminology

Algorithms for Classification

The Process

The Secret Sauce

The Nature o[ Statistical Learning

Workshop: Two Do-lt-Yoursel Classifiers

Further Reading

19 EPILOGUE: FACTS ARE NOT REALITY

A PROGRAMMING ENVIRONMENTS FOR SCIENTIFIC COMPUTATION

AND DATA ANALYSIS

Software Tools

A Catalog of Scientific Software

Writing Your Own

Further Reading

B RESULTS FROM CALCULUS

Common Functions

Calculus

Useful Tricks

Notation and Basic Math

Where to Go from Here

Further Readin9

WORKING WITH DATA

Sources for Data

Cleanin9 and ConditioninR

Sarnplin9

Data File Formats

The Care and Feeding of Your Data Zoo

Skills

Terminology

Further Fleadin9

INDEX