All Questions
13
questions
0
votes
5
answers
158
views
How to bring the next line to the end of the first line using awk, separated by a comma?
I have downloaded some sequences from a publicly available database in .fa format. I want to generate a .csv file that contains the name of the sequence, and its length, separated by a comma.
This is ...
-1
votes
4
answers
2k
views
Grep for a range of numbers
I have a .txt file with multiple lines that gives amino acid and residue data. The data looks like this:
ARG262-Side ASP368-Side 140,83%
ARG95-Side GLU107-Side 103,73%
ARG474-Side VAL468-Main 94,93%
...
1
vote
3
answers
94
views
is there is a way to extract lines in between 3 files that are in common based on one column?
I have 3 files, space separated which have about 3.4 million lines (but they don't have exactly the same number of lines and they are sorted by the "Marker" column). They look like this:
head neu1
...
-1
votes
5
answers
3k
views
How to filter column in TSV file with bilions rows
I am working with a list with billions rows of data.
I have data like this:
Like you see, in fourth column (gene column) there exist names of genes but not all rows have a "gene name". I need to get ...
0
votes
7
answers
1k
views
Simplest way to extract a portion of a string?
I've got a file (bigfile.txt), one of the columns looks like this
NW_017095471.1 Gnomon mRNA 108321 109565 . + . ID=rna34;Parent=gene27;Dbxref=GeneID:108565285,Genbank:XM_017925071.1;...
8
votes
3
answers
15k
views
extract lines that match a list of words in another file
I have file 1 which have those lines:
ATM 1434.972183
BMPR2 10762.78192
BMPR2 10762.78192
BMPR2 1469.14535
BMPR2 1469.14535
BMPR2 1738.479639
BMS1 4907.841667
BMS1 4907.841667
BMS1 880.4532628
BMS1 ...
2
votes
3
answers
6k
views
Print text before and after match, from a specific beginning and to an ending string
I am trying to extract entries from within a large Genbank file, with many thousands of entries. For a search string, I'm using a unique gene name – that works fine. The tricky bit is that I'd like to ...
3
votes
3
answers
780
views
Comparing two files linewise and if pattern of file 1 is not found (fully or partially) in file 2 then print line of file 1
I have two files:
file1 (search):
1
GACGGAGGATGCAAGTGTTATCCGGAATCACTGGGCGTAAAGTTTTTTTTT
2
GACGGAGGATGCAAGTGTTATCCGGAAT
3
GACGGAGGATGCAAGTGTTATCCGGAATCACTGGGCGTAAAGCGTCC
4
...
0
votes
2
answers
97
views
Extract value from formatted text with bash
I've got a .fasta file, which is strictly a formatted text containing some informations about DNA.
Here's its common structure:
>NODE_18_length_75451_cov_83.3021
...
0
votes
3
answers
700
views
extract block of text from another file
i want to extract block of text based on ids present in another file,
Input
>Feature scaffold1
1 100 g
101 200 g
201 300 g
500 500 r
900 1000 r
>Feature scaffold2
1 100 g
01 500 g
200 ...
0
votes
2
answers
698
views
Find and replace lines in text file with output from another file
I have two files A and B.
File A
>Node1
...
>Node2
...
File B
>gb|KY551314.1| Influenza A virus (A/mallard/Idaho/AH0011522/2015(H7N7)) segment
2 polymerase PB1 (PB1) and ...
1
vote
2
answers
174
views
Deconstructing one line into two lines based on specific columns
I have a .tsv file (batch_1.catalog.tags.tsv) consisting of 1,965,056 lines of 14 columns. I want to break some of these into two lines.
The first line: starts with a greater than sign (>) ...
2
votes
0
answers
395
views
use gff2fasta instead of a bash script to get parts of DNA sequences out of a full genome
EDIT and a solution
Because my original question was badly phrased and I was trying to re-invent the wheel I am answering my own question now (maybe it helps someone else):
gff2fasta is a tool which ...