Guide to Obtaining and Applying Securities Data Based on the BaoStock Platform
Preface: Background of Data Demand in Quantitative Investment
In today's rapidly developing financial technology landscape, quantitative investment has become a widely adopted strategy among institutional and individual investors. The core of quantitative investment lies in systematically analyzing financial markets through mathematical models and computer technology, all built upon high-quality financial data acquisition. Traditional financial data service providers often charge exorbitant fees, creating a high barrier for individual researchers and small institutions. As the first completely free securities data platform in China, BaoStock significantly lowers the threshold for quantitative research and makes an important contribution to democratizing financial data.
Chapter 1 Overview of the BaoStock Platform
1.1 Platform Positioning and Core Advantages
BaoStock is a free open-source securities data platform based on Python API, characterized by its complete exemption from user registration processes; developers can directly obtain various types of securities market data through simple interface calls. The platform uses Pandas DataFrame as its standard return format for data, greatly facilitating subsequent data processing and analysis work. Although BaoStock's coverage in certain niche areas may be lacking compared to commercial platforms, it provides sufficient A-share market data to meet most basic quantitative research needs.
From a technical architecture perspective, BaoStock adopts lightweight service design that offers services via RESTful API interfaces with fast response times and good stability. The backend database is regularly updated to ensure timeliness and accuracy of the data. Notably, there are no restrictions on call frequency for personal users—this is rare among free data services.
1.2 Detailed Explanation of Data Coverage
Currently, BaoStock mainly covers Chinese A-share market data across several categories:
- In terms of stock trading information, it provides complete daily K-line (candlestick) charts from December 19th, 1990 until now along with weekly K-lines and monthly K-lines. For intraday trading analysis needs since July 26th, 1999 it also offers K-line datasets at intervals of five minutes up to sixty minutes; this high-frequency information holds significant value for developing short-term trading strategies.
- Regarding index information,the platform includes various indices dating back from January 1st,2006 such as composite indices,size indices,industry indices (primary & secondary),strategy indices,growth indices,value indices,以及 thematic indexes etc., providing solid foundations for overall market trend analysis及行业轮动研究。
- Financial statements serve as crucial bases for quantifying stock selection。BaoStock提供自2007年以来的季度财务数据,包括上市公司的资产负债表、现金流量表和利润表等核心财务报表数据以及杜邦分析体系中的关键指标。此外,该平台还收录了自2003年起的上市公司业绩预告和自2006年起的业绩快报数据,这些信息对于事件驱动型策略开发尤为重要。
Chapter 2 Installation & Configuration of the Platform
2.1 Environment Preparation & Installation Process n Before using BaoStock , ensure that your Python environment is correctly installed . It’s recommended you use Python version above or equal than 3 .6 so best compatibility can be achieved .The installation process itself is very straightforward ; simply execute following pip install command within command line : pip install baostock -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn This command utilizes Tsinghua University mirror source which will significantly enhance download speeds domestically . After completing installation , it's advisable immediately verify functionality by executing below test code within interactive python environment : python import baostock as bs bs.login() When seeing “login success!” prompt message appears indicates successful completion allowing normal usage across all functionalities available within this platform . Note worth mentioning here would be that session management mechanism employed means prolonged inactivity could lead sessions timing out requiring re-login before continuing fetching any further required dataset(s). n 2 .2 Usage Suggestions & Precautions n Despite having no limitations placed upon user calling frequencies , developers should implement reasonable request intervals into their codes considering rational utilization server resources involved ; ideally large-scale historical downloads should occur during non-trading hours alleviating burden imposed onto servers themselves while current multi-threaded downloading isn’t supported thus when needing acquire substantial amounts datasets one might consider utilizing multiprocessing module found inside python achieving parallelized downloads instead. n Concerning updates regarding datasets specifically they follow clear schedule wherein daily closing price entries conclude around time frame post-market close(17:30) whilst minute-level k-data gets recorded by roughly hour later(20:30), finally next day early morning(01:30) sees prior trade day's fiscal reports getting logged into system ensuring awareness these timelines critical towards constructing real-time monitoring systems effectively! n ### Chapter Three Core API Functionality Explained n **3 .1 Basic Query Functions Available ** n Main APIs provided include login(), query_all_stock() alongside query_history_k_data_plus(). Login function establishes connection server side without necessitating authentication details yet remains prerequisite other features’ usages typically operating pattern follows suit logging → querying→ logging out guaranteeing security simplicity operations alike ! Query_all_stock retrieves entire list stocks existing specified date useful building pools containing relevant equities where returned results comprise stock codes transaction statuses names amongst essential info points especially caution advised querying non-trade days returns empty dataframe hence requires careful handling backtesting scenarios! n 3..2 Historical K-Line Dataset Acquisition Methodology:
supports acquiring diverse kinds candlestick chart records supporting rich parameter configurations including specifying desired security code under ‘code’ parameter formatted like ‘market.code’, e.g., sh600000 represents Shanghai Stock Exchange’s Pudong Development Bank shares available fields encompass up-to seventeen different indicators enabling flexibility according actual requirements selected metrics ranging closing prices volumes turnover rates etc.; frequency determines type k-charts spanning five-minute interval month-long durations adjustflag accommodates adjustments ensuring consistency long-term historical records encountered situations involve suspensions trades special conditions whereby tradestatus field identifies active status thereby filtering unwanted periods interference resulting backtest outcomes negatively impacted accordingly! ### Fourth Section Typical Application Scenarios Examples ##4...## Complete Workflow Illustrating How To Retrieve Required Datasets From Logging Through Fetching Results With Code Snippet Demonstrated Herein:import pandas pd # Establish Connection lg = bs.login() print(f'Login Status:{lg.error_msg}')# Construct Parameters stock_code="sh600000" fields="date,,open,,high,,low,,close,,,volume,,,amount,pctchg" start_date="20220101" end_date="20221231"# Fetch Historical Information rs=bs.query_history_k_data_plus(stock_code,)fields,start_date=start_date,end_date=end_date,freqency=d,'adjustflag=' ' )# Processing Returned Outcomes datalist=[]while(rs.error_code=='0')&rs.next():datalist.append(rs.get_row_data())result=pd.DataFrame(datalist columns=rs.fields)# Post-processing Result result['date']=pd.to_datetime(result['date'])result[['open','high','low','close']]=result[['open','high','low','close']].astype(float)# Disconnect Session bs.logout()This snippet illustrates how retrieve yearly daily line statistics pertaining specific equity transforming respective datatype accurately corresponding application additional exception handling logic ensures network fluctuations do not cause unexpected program exits.# Parallel Processing Framework Multi-stock Queries Though Baostock lacks support threading queries implementation multiprocessing allows concurrent retrieval thus offering enhanced efficiency throughout obtaining comprehensive marketplace intelligence presented below framework:from multiprocessing import Pool def fetch_single_stock(code):bs.login()#data retrieval logicbs.logout();return(data)if name== 'main':stock_list=["sh600000
