R: Reading a delimited table when end of each row is not delimited

r database text

80 просмотра

2 ответа

682 Репутация автора

I am trying to read in the Census' Geographic Boundary Change Notes. As one can see at the hyperlink, the file is a K x 11 table. A pipe-delimited text version is available via the link on that page.

I tried manually saving the pipe-deliminated text version as a .txt file (e.g. foo.txt) and then reading it in as a pipe-deliminated table via:

data <- read.table("foo.txt", sep="|")

However, this produces an error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 177 did not have 11 elements

When I scrolled down to what I think is line 177, I do not see anything missing. I therefore looked at the raw text to see potential issues and one issue is that it looks like the pipes do not delimit individual rows - i.e. there is not any indicator that we are at the end of the row of the table in the raw .txt file. However, this seems to contradict the fact that read.table() was expecting 11 objects.

  1. Do I need to add to the .txt file a delimiter corresponding to the end of each row of the table?

  2. If so, how might I do this without manually adding a delimiter?

Apologies if this is not the problem.

Автор: user3614648 Источник Размещён: 19.07.2016 04:03

Ответы (2)


1 плюс

682 Репутация автора

Решение

There was no issue with delimiting. I instead downloaded the .txt file and opened it in Microsoft Excel using '|' as the delimiter. Scrolling down to rows where there were issues, it appears that Spanish characters were causing issues.

Автор: user3614648 Размещён: 19.07.2016 04:53

1 плюс

446 Репутация автора

Line 177 refers to Clark's Point city. Since the default quote argument for read.table is quote = "\"'" the ' in Clark's is read as the start of a character string. Everything after the ', including line breaks, is read as part of one character string until read.table gets to line 337, where it finds Changed to Tohono O'odham Nation and reads the ' in O'odham as the closing quote character for the character string started on line 177.

If you set read.table("foo.txt", sep="|", quote = NULL) it should fix the issue.

Автор: Josh Размещён: 27.08.2018 05:24
Вопросы из категории :
32x32