How To Delete The \n\t\t\t In The Result From Website Data Collection?
Solution 1:
I worry a bit about removing all tabs but this would do it:
> reviews <-"VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"> reviews <- gsub( "\\\t", "", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\n\nDocument Camera\n\n"
Read ?regex
and understand that there are extra backslashes needed because both R and regex use "\" as escapes and so there are two levels of character parsing on the way to a pattern. That's not the case in the replacement argument though so you don't need to used doubled escapes there. So if you then wanted to replace those "\n\n"'s with just one "\n" you could use:
> reviews <- gsub( "\\\n\\\n", "\n", reviews)
> reviews
[1] "VZ-C6 / VZ-C3D\nDocument Camera\n"
Solution 2:
The go-to function for "find and replace" operations on strings in R are sub
(to replace just the first instance) and gsub
(to replace all instances). These functions seek a pattern in the string represented by a regular expression, and replace it by a fixed string of text.
For example:
s <-"VZ-C6 / VZ-C3D\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\tDocument Camera\n\t\t\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t"
gsub('\t|\n', '', s)
[1] "VZ-C6 / VZ-C3DDocument Camera"
The pipe operator (|
) in the the pattern above, \t|\n
, ensures that either \n
or\t
are matched, and the second argument of ''
says to replace matches with an empty string (i.e. nothing).
While s
above contains just a single element, gsub
and sub
are vectorised and so will also work on an entire vector of arbitrary length.
Post a Comment for "How To Delete The \n\t\t\t In The Result From Website Data Collection?"