Abstract:Aiming at the problems of fuzzy text expression and unclear entity boundary in Internet open data, this paper constructs Space-Corpus, and proposes a named entity recognition model based on BERT + Bi-LSTM + CRF. The bidirectional encoder representations from transformer (BERT) model based on two-way training Transformer generates the vectorized representation of the input corpus, combines with bi-directional long short-term memory (Bi-LSTM) to obtain the context features, decodes and annotates the sequence through conditional random field (CRF), and outputs the predicted label with the highest score. Experimental results show that the proposed model outperforms the BERT model, BERT + Bi-LSTM model and CNN + Bi-LSTM + CRF model in terms of accuracy, recall and F1 score on Space-Corpus corpus.